m.  g.Kendall.  sc.d 


A  COURSE  IN 

THE  GEOMETRY  OF 
n  DIMENSIONS 


BEING 
NUMBER  EIGHT 


GRIFFIN'S   STATISTICAL 
MONOGRAPHS  &  COURSES 

EDITED    BY    M.    G.    KENDALL,    Sc.D. 


HUME  LIBRARY 

Florida  Agricultural 

Experiment  Station 

Gainesville,  Florida 


PUBLISHERS'  NOTE 

The  series  in  which  this  title  appears  was  introduced  by  the 
publishers  in  1957  and  is  under  the  general  editorship  of  Maurice 
G.  Kendall,  Sc.D.,  formerly  Professor  of  Statistics  in  the  University 
of  London.  It  is  intended  to  fill  a  need  which  has  been  evident  for 
some  time  and  is  likely  to  grow :  the  need  for  some  form  of  publica- 
tion at  moderate  cost  which  will  make  accessible  to  a  group  of 
readers  specialized  studies  in  statistics  or  special  courses  on  par- 
ticular statistical  topics.  There  are  numerous  cases  where,  for 
example,  a  monograph  on  some  newly  developed  field  would  be 
very  useful,  but  the  subject  has  not  reached  the  stage  where  a 
comprehensive  book  is  possible ;  or,  again,  where  a  course  of  study 
is  desired  in  a  domain  not  covered  by  textbooks  but  where  an 
exhaustive  treatment,  even  if  possible,  would  be  expensive  and 
perhaps  too  elaborate  for  the  readers'  needs. 

Considerable  attention  has  been  given  to  the  problem  of  pro- 
ducing these  books  speedily  and  economically.  Appearing  in  a 
cover  the  design  of  which  will  be  standard,  the  contents  of  each 
volume  will  follow  a  simple,  straightforward  layout,  the  text  pro- 
duction method  adopted  being  suited  to  the  complexity  or  otherwise 
of  the  subject. 

The  publishers  will  be  interested  in  approaches  from  any  authors 
who  have  work  of  importance  suitable  for  the  series. 

CHARLES  GRIFFIN  &  CO.  LTD. 


GRIFFIN  BOOKS  ON  STATISTICS,  &c. 

A  statistical  primer  f.  n.  david 

An  introduction  to  the  theory  of  statistics  g.  u.  yule  and 

M.  G.  KENDALL 

The  advanced  theory  of  statistics  (three  volumes)  m.  G.  kendall  and 

A.  STUART 

Rank  correlation  methods  m.  g.  kendall 

Exercises  in  theoretical  statistics  m.  G.  kendall 

Rapid  statistical  calculations  m.  h.  quenouille 
Combinatorial  chance                          f.  n.  david  and  d.  e.  barton 

Biomathematics  c.  A.  B.  smith 

The  design  and  analysis  of  experiment  m.  h.  quenouille 

Sampling  methods  for  censuses  and  surveys  f.  yates 

Statistical  method  in  biological  assay  D.  J.  finney 

The  mathematical  theory  of  epidemics  n.  t.  j.  bailey 

Probability  and  the  weighing  of  evidence  I.  jr.  good 


Griffin's  Statistical  Monographs  and  Courses: 
No.  1 :  The  analysis  of  multiple  time-series        m.  h.  quenouille 

No.  2 :  A  course  in  multivariate  analysis  M.  G.  kendall 

No.  3 :  The  fundamentals  of  statistical  reasoning  m.  h.  quenouille 

No.  4:  Basic  ideas  of  scientific  sampling  A.  stuart 

No.  5:  Characteristic  functions  e.  lukacs 

No.  6:  An  introduction  to  infinitely  many  variates    E.  A.  robinson 

No.  7:  Mathematical  methods  in  the  theory  of  queueing 

A.  Y.  KHINTCHINE 


A  COURSE  IN 

THE  GEOMETRY  OF 
n  DIMENSIONS 


M.  G.  KENDALL,  Sc.D. 

formerly  Professor  of  Statistics  in  the  University  of  London 
President,  Royal  Statistical  Society,  1960-62 


being   NUMBER   EIGHT   of 
GRIFFIN'S       STATISTICAL 
MONOGRAPHS  &  COURSES 

EDITED  BY 

M.   G.   KENDALL,   Sc.D. 


HAFNER  PUBLISHING  COMPANY 
NEW  YORK 


Copyright  ©  1961 

CHARLES  GRIFFIN  &  COMPANY  LIMITED 

42  DRURY  LANE,  LONDON,  WC  2 

K33c 

AGRI- 
CULTURAL 
LIRRARY 


First  published  in  1961 


PRINTED  IN  GREAT  BRITAIN  BY  JOHN  WRIGHT  &  SONS  LTD.,  AT  THE  STONEBRIDGE  PRESS,  BRISTOL 


PREFACE 

The  geometry  of  n  dimensions,  as  developed  by  pure 
mathematicians,  is  a  somewhat  recondite  and  difficult  subject, 
most  of  which  is  remarkable  more  for  its  aesthetic  appeal  than 
for  its  utility.  Certain  branches  of  it,  however,  have  an 
immediate  application  in  statistics,  partly  in  clarifying  statistical 
ideas,  partly  in  solving  distributional  problems.  In  fact,  it  is 
not  easy  to  develop  a  comprehensive  theory  of  statistics  without 
introducing  ^-dimensional  geometry  at  a  fairly  early  stage. 

In  teaching  statistics  at  the  advanced  level  I  have  found 
that  most  students,  even  those  with  a  good  mathematical 
background,  encounter  serious  difficulty  with  proofs  depending 
on  w-dimensional  systems.  The  following  course  was  given  at 
the  London  School  of  Economics  in  1960  to  fill  a  gap  in  my 
students'  knowledge.  It  does  not  purport  to  be  a  complete 
account  of  w-dimensional  geometry  or  to  replace  the  excellent 
little  book  by  D.  M.  Y.  Sommerville,  first  published  in  1929  as 
An  Introduction  to  the  Geometry  of  N  Dimensions.  My  object 
was  to  set  out  that  part  of  the  subject  which  had  statistical 
applications  and  to  sketch  very  briefly  what  those  applications 
were.  Since  teachers  in  other  parts  of  the  world  doubtless 
encounter  similar  difficulties,  I  decided  to  publish  the  lecture 
notes  in  the  hope  that  they  might  be  found  generally  useful. 


- 


Although  I  have  no  direct  experience,  I  suspect  that  there 
are  other  fields  of  applied  mathematics  where  the  ideas  of 
//-dimensional  geometry  are  serviceable;  and  I  hope,  accord- 
ingly, that  the  first  half  of  the  book,  at  least,  may  help  students 
and  teachers  outside  the  domain  of  theoretical  statistics. 

From  the  nature  of  the  case,  diagrams  of  72-dimensional 
situations  are  difficult,  if  not  impossible,  to  present.  In 
lecturing,  however,  I  make  a  free  use  of  two-dimensional 
drawings  of  two-  and  three-dimensional  cases,  and  I  would 
recommend  the  student  to  get  into  the  habit  of  sketching  these 
simpler  cases  for  himself,  so  as  to  prepare  himself  for  the 
visualization  of  the  w-dimensional  extensions. 

My  thanks  are  due  to  Mr.  T.  M.  F.  Smith  and  Mr. 
A.  W.  Matz,  who  read  the  typescript  of  the  book  and  materially 
helped  to  remove  obscurities  and  misprints.  Doubtless  some 
remain  and  I  should  be  grateful  to  any  reader  who  calls  them 
to  my  attention. 

M.  G.  K. 

London, 
May,  1961. 
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PART  1 
THE  GEOMETRY  OF  n  DIMENSIONS 

Introduction 

1  There  are  at  least  two  ways  in  which  a  geometry  may  be 
developed.  The  first  is  familiar  to  anyone  who  has  studied 
Euclid's  Elements  or  its  modern  equivalent  at  school.  It  begins 
with  certain  undefined  ideas  such  as  "point"  and  "straight 
line";  it  requires  them  to  obey  certain  postulates;  and  it  then 
proceeds  to  develop  a  series  of  propositions  by  purely  deduc- 
tive logic.  As  is  well  known,  a  remarkably  large  body  of 
theorems  can  be  deduced  from  comparatively  few  primitive 
ideas  in  this  field. 

2  The  second  approach,  originated  by  Descartes,  is  to  relate 
the  concepts  of  geometry  to  the  properties  of  numerical  co- 
ordinates. A  point  in  a  plane  is  defined  by  a  pair  of  numbers 
(x, y).  A  straight  line  is  the  locus  of  all  points  obeying  a  linear 
relation  Ix  +  my  +  n  =  0 ;  and  so  on.  Coordinate  geometry  lacks 
the  elegance  and  aesthetic  appeal  of  the  classical  Greek  system, 
but  it  is  much  more  powerful  and  enables  us  to  apply  to  the 
geometrical  domain  many  of  the  results  of  numerical,  algebraical 
and  analytical  mathematics. 

3  The  idea  of  "dimension"  itself  is,  on  the  axiomatic  approach, 
usually  introduced  intuitively.    We  all  "know"  what  is  meant 
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by  one,  two  and  three  dimensions.  In  actual  fact,  we  experi- 
ence only  three-dimensional  objects,  the  two-  and  one- 
dimensional  concepts  being  abstracts  from  observation.  They 
do  not  cause  us  much  difficulty.  When  we  wish  to  proceed 
from  three  to  more  dimensions,  however,  we  run  up  against  a 
conceptual  barrier  so  difficult  that  many  people  cannot 
surmount  it.  Just  how  far  certain  gifted  individuals  can 
visualize,  say,  a  space  of  four  dimensions  is  an  arguable  matter 
which  it  would  be  wise  to  leave  to  the  psychologists.  My  own 
opinion  is  that  nobody  can.  (The  so-called  four-dimensional 
continuum  consisting  of  three  spatial  and  one  temporal  dimen- 
sion is  quite  a  different  thing  and  is  not  relevant  here.)  But 
these  difficulties  of  "seeing  the  situation"  do  not  prevent  us 
from  setting  up  a  geometry  of  n  dimensions  in  the  classical 
sense  and  of  reasoning  about  it.* 

4  In  the  following  treatment  we  shall  rely  mainly  on  the  more 
modern  approach  through  coordinate  geometry.  A  point  in 
n  dimensions  is  defined  as  an  ordered  set  of  quantities 
(xv  x2, . . . ,  xn)  and  there  is  no  real  difficulty  in  making  n  as 
large  as  we  please.  The  geometry  we  shall  be  concerned  with 
is  then  equivalent  to  the  mathematics  of  these  ordered  w-uples, 
which  we  can  also,  if  we  wish,  interpret  as  vectors  (usually 
column- vectors).  From  this  viewpoint  our  "geometry" 
becomes  a  branch  of  numerical  mathematics  couched  in  a 
particular  language,  and  it  is  always  possible  to  express  our 
results  in  a  non-geometrical  form.  For  example,  in  two  dimen- 
sions two  straight  lines  always  meet  in  a  single  point  except 
when  they  are  parallel.  This  is  equivalent  to  saying  that  the 
two  equations 

lxx  +  l2y  +  lz  =  0 

m1x  +  m2y  +  m3  =  0 

*  The  reader  who  is  interested  in  the  axiomatic  approach  will  find 
an  account  of  it  in  Sommerville's  book  (1958)  (see  page  63). 


have  a  unique  solution  unless  the  determinant 


=  lx  m2  — 12  m1  =  0. 


5  What,  then,  is  the  point  of  having  a  geometry  of  n  dimen- 
sions ?  Why  not  express  all  our  results  in  terms  of  algebraic 
quantities  which  make  no  demands  on  our  powers  of  visualiza- 
tion ?   There  are,  I  suggest,  three  reasons : 

(1)  The  language  of  geometry  is  much  simpler  and  more 
elegant  than  the  language  of  algebra.  To  say  that  in  three 
dimensions  a  plane  cuts  a  sphere  in  a  circle  is  very  much  simpler, 
and,  in  a  sense,  more  immediately  informative,  than  stating  the 
equivalent  in  terms  of  coordinates,  which  would  have  to  go 
something  like  this :  the  surface 

(x-  af  +  (y  -bf  +  {z-  cf  =  r2 

determines  on  the  plane 

lx  x  +  l2y  -f  /3  #  +  /4  =  0 

a  curve  which,  with  a  suitable  change  of  coordinates,  can  be 
expressed  in  the  form 

(f-ap  +  fo-jSp-pi. 

(2)  It  is  possible  to  carry  through  quite  rigorous  trains  of 
reasoning  in  geometrical  terms  without  translating  them  into 
algebra.  This  gives  us  considerable  economy  both  in  thought 
and  in  communication  of  thought.  We  shall  meet  many 
examples  in  the  sequel.  One  reason  for  this  possibility  is  that, 
although  we  are  working  with  a  coordinate  system,  most  of 
the  results  we  need  are  independent  of  the  system  and  are 
therefore  invariant  under  certain  classes  of  coordinate  trans- 
formation. The  language  of  geometry  embodies  this  element 
of  invariance. 

(3)  Most  important,  perhaps,  is  the  fact  that  our  geometrical 
imagery  in  two  or  three  dimensions  suggests  results  for  more 


dimensions  and  offers  us  a  powerful  tool  of  inductive  or  creative 
reasoning.  For  example,  in  three  dimensions  two  spheres 
intersect  in  a  circle  (if  they  intersect  at  all).  This  immediately 
suggests  that  in  n  dimensions  two  hyperspheres  of  n  —  1 
dimensions  intersect  in  a  hypersphere  of  n  —  2  dimensions,  a 
proposition  which  is  easy  to  verify.  Some  of  the  results  which 
we  shall  encounter  later  were  suggested  by  this  kind  of 
analogical  argument.  Even  persons  who  refuse  to  recognize 
its  rigour  must  acknowledge  its  creative  possibilities.  Some  of 
the  results  arrived  at  in  this  manner  are,  indeed,  exceedingly 
difficult  to  establish  by  analytical  methods. 

6  Apart  from  the  simpler  undefinables  of  classical  geometry 
such  as  point,  line  and  curve,  we  shall  require  certain  other 
notions,  of  which  the  chief  are  distance,  angle  between  two 
lines,  and  volume  (or  content).  These  can  be  defined 
analytically,  e.g.  the  distance  between  two  points  (xv  x2,x3) 
and  (y^y 2^3)  m  three  dimensions  may  be  defined  as  the 
positive  square  root  of 

(*i  -  y±)2 + (*2  -  j2)2 + (*8  -  y*)2- 

In  geometries  where  distance  is  defined  differently  (for 
example,  in  Riemannian  geometries  where  spaces  are  "curved" 
and  a  metric  is  set  up  in  terms  of  a  quadratic  differential 
element)  our  intuitive  ideas  may  need  severe  conditioning. 
But  such  matters  will  not  concern  us  here.  If  we  have  to  deal 
with  curved  surfaces  they  will  be  immersed  in  flat  or  Euclidean 
spaces,  just  as  the  surface  of  an  ordinary  sphere  is  immersed 
in  the  ordinary  three  dimensions  of  experience.  We  shall  find 
that  in  n  dimensions  our  ideas  of  angle  and  volume  are  fairly 
straightforward  generalizations  of  the  three-dimensional  con- 
cepts. Straight  lines  are  the  shortest  distances  between  points, 
parallels  exist,  and  Pythagoras'  theorem  holds  almost  by 
definition. 


Reality 

7  Questions  of  reality,  however,  sometimes  need  attention. 
In  axiomatic  geometries  such  as  the  two-dimensional  geometry 
of  the  schoolroom  we  are  concerned  almost  entirely  with  real 
points.  Thus,  for  example,  we  have  to  say  that  a  straight  line 
meets  a  circle  either  in  two  points,  or  in  one  point  (as  a  tangent), 
or  no  points.  From  the  analytical  viewpoint  we  can  put  this 
more  simply  at  the  expense  of  admitting  complex  numbers :  a 
linear  and  a  quadratic  form  in  two  variables  with  real  coefficients 
have  two  values  in  common,  both  real,  both  real  and  coincident, 
both  imaginary.  Attractive  as  this  may  be  from  the  point  of 
view  of  generality,  it  is  a  resource  which  is  not  always  open  to 
us  in  the  applications  of  w-dimensional  geometry  which  we  are 
going  to  consider.  In  integrating  over  a  domain  determined  by 
certain  boundaries,  for  example,  it  is  vital  to  know  how  those 
boundaries  intersect  in  the  real  domain. 

Varieties 

8  Consider,  then,  the  aggregate  of  points  typified  by 
(xlt  x2, . . . ,  xn),  where  the  x's  can  take  any  real  values  from 
—  oo  to  oo.   We  shall  refer  to  this  space  as  an  Sn. 

Any  equation  in  the  variables  x  determines  a  subspace  in  Sn. 
This  is  called  a  variety.  For  the  most  part  we  shall  be  concerned 
with  varieties  determined  by  rational  integral  algebraic 
equations.  If  the  degree  of  such  an  equation  is  r  we  say  that  the 
order  of  the  variety  is  r  and  write  it  as  P^_x.  It  is  of  n—  1 
dimensions  because  n  —  1  coordinates  are  sufficient  to  fix  a 
point  on  it.  Likewise  if  we  have  a  set  of  p  equations  in 
xltx2,...,xn  they  define  a  V^_pi  where  h  is  its  order  as 
defined  in  section  13  below. 

Intersections  of  varieties 

9  Two  varieties  in  an  Sn  may  or  may  not  have  points  in 
common.   For  example,  a  line  and  a  circle  in  three  dimensions 


may  not  meet;  a  circle  and  a  sphere  in  three  dimensions  will 
meet  (if  at  all  in  the  real  plane)  in  two  points  unless  the  circle 
lies  entirely  on  the  sphere ;  and  so  on.  Generally,  if  a  Vn_p  and 
Vn_Q  have  points  in  common,  they  are  said  to  intersect.  By  the 
definition  of  variety,  this  intersection  is  a  variety.  In  general, 
a  Vn_p  and  a  Vn_q  intersect  in  a  Vn_(p+q).  For  the  first  is  deter- 
mined by  p  equations  and  the  second  by  q,  and  hence  their 
common  set  of  values  by  p  +  q  equations.  If  p  +  q  =  n  the 
varieties  will  intersect  only  in  points.  If  p  +  q<  n  they  will  not, 
in  general,  intersect  at  all. 

By  extension,  a  Vn_p,  Vn_q  and  Vn_r  intersect  in  a  Vn_p_q_r. 
This  is  easily  proved  by  induction,  and  the  extension  to  more 
varieties  is  immediate. 

10  To  avoid  prolixity  we  may,  at  this  point,  set  aside 
from  the  main  discussion  certain  degenerate  situations.  If,  of 
the  p  equations  determining  a  Vn_p  in  Sn,  some  are  dependent 
on  the  others  (i.e.  can  be  expressed  as  a  function  of  them)  we 
have  fewer  than/)  independent  conditions,  say^Z,  and  the  variety 
is  of  n—p'  dimensions.  Likewise  if  p  independent  and  q 
independent  equations  are  not  independent  between  themselves, 
but  are  together  equivalent  to  t<p  +  q  independent  equations, 
they  determine  a  variety  of  dimension  n  —  t.  On  occasion  we 
shall  have  to  consider  these  degeneracies  ad  hoc,  but  for  the 
most  part  we  shall,  unless  the  contrary  is  stated,  assume  that 
they  do  not  exist  or  that  they  have  been  removed  from  the 
situation  by  prior  scrutiny. 

11  Linear  spaces  are  of  particular  importance.  A  variety  of 
order  1  is  called  a  hyperplane,  a  prime,  or  a  flat.  If  the  p 
equations  in  the  (xlf  ...,xn)  are  linear  they  define  an  (n—  />)-flat, 
which  is  an  Sn_p.  A  /)-flat,  then,  is  a  flat  (Euclidean)  space  of 
p  dimensions. 

It  follows  from  the  definition  that  the  intersection  of  two 
flats  is  always  flat. 


An  S±  is  called  a  (straight)  line.  In  n  dimensions  (n—  1) 
equations  are  required  to  define  it.  As  a  matter  of  convenience 
in  symmetry,  however,  we  sometimes  write  it  in  the  form 
(X  relating  to  current  coordinates,  x  to  a  point  on  the  line) 

Aj  —  Xi  JL2       X2  _  -^3       x3  _  -<*-n       Xn  /i\ 

'l  ^2  ^3  *"ti 

which  comprises  (n—  1)  independent  equations.  We  also  find 
it  convenient  to  represent  the  line  with  a  parameter  p  by 
putting 

X1  =  x±  +  plx 

X2  =  x2  +  />/2 

etc.  (2) 

12  An  £2  is  called  a  plane.  Except  in  three  dimensions  it  is 
not  so  easy  to  write  in  a  symmetrical  form  similar  to  equation  (1), 
but  we  can  write  it  in  parametric  form 

X±  =  xx  +  plx  +  am1 
X2  =  x2  4-  pl2  +  o-m2 

etc.  (3) 

Clearly  if  we  eliminate  p  and  a  from  the  n  equations  (3)  we 
have  n  —  2  linear  equations  in  the  current  coordinates  X  which 
define  an  S2.  Conversely,  given  n  —  2  linear  equations  we  can 
reduce  them  to  the  form  (3). 

Example  1 

In  S3  two  planes  intersect  in  an  S^^-l  =  Sl9  namely  a  line. 
Two  lines  intersect  in  an  S3_2_2  =  S_ly  i.e.  do  not  intersect  in 
general.  In  four  dimensions,  54,  two  planes  $2  intersect  in  an 
£4-2-2  =  So*  i.e.  in  a  point.  In  S5  they  do  not  intersect  in 
general. 

In  S5  the  three  spaces  S4,  S3y  S3  intersect  in  an  *S,5_1_2_2  =  S0i 
i.e.  in  a  point. 


Example  2 

A  straight  line  in  Sn  intersects  a  V^_x  in  r  points  (some  of 
which  may  be  coincident  or  imaginary).  For  the  V^_x  is  defined 
by  one  equation  in  the  x's  of  degree  r.  If  we  substitute  from 
equations  (2)  we  have  a  polynomial  of  degree  r  in  p,  with  r  roots. 

13  The  order  of  a  variety  Vp  is  most  conveniently  defined  as 
the  number  of  points  in  which  it  intersects  an  arbitrary  Sn_p. 
This  is  consistent  with  our  definition  of  the  order  of  a  Vn_1. 

A  V£  and  a  V£_p  in  an  Sn  will,  in  general,  intersect  in  rs  points. 
Consider,  in  fact,  the  Vp  defined  by  n—p  equations,  say  of 
degree  a1}  a2,  ...}ocn_p.  An  arbitrary  Sn_p  can  be  expressed,  as 
in  equation  (3),  in  terms  of  n—p  parameters.  When  we  sub- 
stitute for  the  x's  and  eliminate  all  but  one  of  these  parameters 
we  shall  reach  an  equation  in  the  remaining  parameter  of  degree 
a1a2...ocn_p.  The  original  set  of  n—p  equations  are  thus 
equivalent  to  one  of  such  degree  which  is  the  order  r.  Likewise 
there  is  one  equation  of  order  s  for  Vn_p.  These  two  have,  in 
general,  rs  solutions. 

Interpretation  over  matters  of  reality  is  nevertheless  to  be 
carried  out  with  care.  In  two  dimensions  two  conies  (V?)  will 
intersect  in  four  points.  But  two  circles  will  intersect  only  in 
two  real  points  at  most,  the  other  two  being  the  imaginary 
"circular  points"  at  infinity. 

Projection 

14  In  £3  we  can  project  a  curve  on  to  a  plane  by  selecting  a 
point,  joining  it  by  straight  lines  to  all  the  points  of  the  curve 
so  as  to  obtain  a  cone,  and  observing  the  curve  of  intersection 
of  the  cone  and  the  plane  of  projection.  This  is  conical  projec- 
tion. If  the  apex  of  the  cone  is  at  infinity  all  the  lines  of  projection 
are  parallel.  In  particular,  if  they  are  at  right  angles  to  the 
plane  of  projection  we  have  orthogonal  projection. 

The  process  can  be  carried  out  in  n  dimensions.  Given  a  V1 
we  can  select  a  point  O  and  construct  the  family  of  lines  through 
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O  and  Vv  These  lines  will  trace  out  another  Vx  by  intersection 
with  a  given  Sn_v  We  cannot,  of  course,  project  a  Vx  by  these 
means  on  to  a  flat  of  dimension  less  than  n—  1.  We  could 
construct  more  general  methods  of  projection  if  necessary,  but 
they  are  rarely  required. 

Simple  figures  in  n  dimensions 

15  The  generalization  of  a  polygon  in  two  dimensions  is  a  poly- 
tope.  It  is  the  figure  bounded  by  a  set  of  (n  —  l)-flats.  These  them- 
selves intersect,  adjacent  (n—  l)-flats  meeting  in  an  (n  —  2)-flat 
and  so  on.  n  faces  meet  in  a  point  or  vertex.  The  flat  of  r 
dimensions  is  called  an  r-boundary.  Thus,  in  three  dimensions 
the  figure  is  bounded  by  planes ;  these  meet  in  lines,  the  edges ; 
and  these  meet  in  points,  the  corners  or  vertices. 

16  The  least  number  of  (n  —  l)-flats  which  can  enclose  a  space 
and  form  a  polytope  is  n+  1.  Thus  in  two  dimensions  we  have 
the  triangle  and  in  three  dimensions  the  tetrahedron.  Such  a 
figure  is  called  a  simplex. 

Example  3 

The  triangle  has  3  sides,  3  vertices. 

The  tetrahedron  has  4  faces,  6  sides,  4  vertices. 

Consider  the  simplex  in  4  dimensions.  This  has  5  «S3's  as 
"faces".   How  many  2-flats,  1 -flats  and  0-flats  has  it  ? 

The  answer  is  10,  10,  5.  The  3-flats  meet  in  pairs  in  (|)  =  10 
ways  to  form  the  2-flats.  They  meet  in  triplets  in  (|)  =  10  ways 
to  form  the  1 -flats;  and  in  sets  of  four  in  (f)  =  5  ways  to  form 
the  vertices. 

The  general  law  for  n  dimensions  will  now  be  clear.  The 
number  of  flats  of  dimension  n—  1, n  —  2, ...  are  the  successive 
terms  in  the  binomial  (1  +  l)w+1,  omitting  the  first  and  last 
terms,  which  are  each  equal  to  unity. 

17  A  parallelotope  is  the  n- dimensional  analogue  of  the 
parallelogram  and  is  bounded  by  pairs  of  parallel  (n—  l)-flats. 


Anticipating  a  little  our  treatment  of  angle  and  distance,  we 
may  define  an  orthotope  as  the  analogue  of  a  rectangle,  in  which 
bounding  (n—  l)-flats  are  perpendicular,  and  a  hyper  cube  as  an 
orthotope  in  which  the  parallel  bounding  (n—  l)-flats  are  all 
the  same  distance  apart. 

Example  4 

Consider  a  parallelogram  in  two  dimensions.  Take  a  third 
dimension  and  a  line  in  it  through  one  of  the  corners  of  the 
parallelogram.  Consider  the  parallelogram  as  displaced  with 
its  corner  moving  along  this  line,  remaining  parallel  to  its 
original  plane.  It  will  then  trace  out  a  three-dimensional 
parallelogram  or  parallelopiped.  Likewise,  a  parallelotope  in 
n  dimensions  can  be  regarded  as  generated  by  an  (n—  1)- 
dimensional  parallelotope  moving  parallel  to  itself  along  a 
straight  line  making  an  angle  6  with  the  (n—  1)  parallelotope. 
Let  Nr  be  the  number  of  r-boundaries  of  the  ra-dimensional 
parallelotope  and  Nf  the  number  of  r-boundaries  of  the 
(n—  l)-dimensional  parallelotope.  Consider  the  generating 
function 

N0  +  N1t+...+Nnt*,  (4) 

where  N0=  1.  The  Nr  boundaries  are  produced  by  the  N^_r 
boundaries  of  the  (n—  l)-dimensional  parallelotope  together 
with  its  Nf  boundaries  of  r  dimensions  in  their  initial  and  final 
positions.   Hence 

Thus 

n  /  n  \ 

i=o  \j=0  I 

whence,  by  induction, 

Si\£**-«(2+*)»  (5) 

This  arrays  the  r-boundaries  of  the  parallelotope  and  we  have 

Nr  =  2n-rln\,  (6) 
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For  example,  a  hypercube  in  four  dimensions  has 

8  three-dimensional  (cubic)  "hyperfaces" 
24  two-dimensional  (square)  *  'faces" 
32  one-dimensional  "edges" 
16  vertices. 

18  If  Q  is  a  quadratic  form  in  the  variables  xlf...,xn  (which 
may  also  include  linear  terms),  the  Vn_1  denned  by 

Q  =  constant 

is  a  hyper quadric.  As  we  shall  see  later,  a  transformation  to  a 
new  origin  can  be  made  to  eliminate  the  linear  terms,  and  with 
such  an  origin  our  most  general  hyperquadric  may  be  written 

n 

S  ajkxjxk  =  c.  (7) 

If  c  =  0  we  have  a  hypercone  with  vertex  at  the  origin..  In  the 
special  case 

n 

S4  =  c  (8) 

we  have  a  hyper  sphere. 

Coordinate  transformations 

19  Consider  a  transformation 

n 

yi=  HhjXj  +  ai>        *  =  l,2,...,w.  (9) 

3=1 

This  can  be  regarded  as  a  displacement  of  the  origin, 
represented  by  ait  and  a  "rotation",  represented  by  the 
coefficients  l{j.  It  is  usually  convenient  to  consider  these 
separately. 

Of  particular  importance  is  the  so-called  "orthogonal"  trans- 
formation 

y<~  ZhiXj  (io) 
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where 

n 

2j  Hj  Ijcj  =  °ik 

=  1 ,    i  =  k 

=  0,     i^k.  (11) 

Coefficients  /  can  always  be  chosen  so  as  to  obey  (11).  In  fact, 
the  conditions  impose  p  +  \p  (p  —  1)  =  \p  (p  +  1)  constraints  on 
p2  constants,  leaving  us  \p  (p  —  1)  further  conditions  at  disposal. 
We  may  say  that  the  transformation  has  \p{p—  1)  degrees  of 
freedom. 

Writing  L  for  the  matrix  (li})  and  L/  for  its  transpose  we  see 
that  (11)  is  equivalent  to 

LL'  =  I. 
Thus 

\L\\L'\=\, 

and  since  the  determinants  of  the  matrix  and  its  transpose  are 
equal,  we  have 

|Lp-l.  (12) 

We  shall  usually  take  the  determinant  as  +1.    (The  negative 
sign  corresponds  to  a  ''left-handed"  transposition  and  does  not 
affect  the  properties  with  which  we  are  concerned.) 
Note  also  that 

Y  =  LX, 
and  hence 

L'Y  =  LXX  =  X.  (13) 

Thus  the  transformation  is  bi-orthogonal.* 

Helmert's  transformation 

20     One   particular  transformation  is   of  special  interest  in 

*  At  this  stage  "orthogonal"  may  be  regarded  as  a  convenient  term 
to  describe  this  type  of  transformation.  From  the  definition  of  angle 
in  section  23  it  follows  that  it  has  the  usual  properties  of  rectangularity. 
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statistics.   We  put 


Jl  =  "77  \xl  +  x2  ~  ^3) 

y$  =  ~\Y)  \X1  ~^~  X2  ~^~  X3  ~  3X4) 


yn-1  —    j(n(n_  2 \|  1^1  +  «^2  +  •••  +*n-l      \n      *)xni 

yn^-Ll(Xl  +  X2+'''+Xn)'  (14) 

The  reader  should  verify  that  this  transformation  obeys 
the  conditions  (11). 

To  reverse  the  transformation  we  use  the  last  two  equations 
of  (14)  to  obtain  xn.  Then  the  last  three  to  obtain  xn_1  and  so  on. 
We  may  also  use  the  property  expressed  in  (13)  and,  reading 
the  columns  downward,  write  down  at  once 

*1=  Tzyi+T^+-"+whi)}yn-1+Tnyn 

**  =  -^+^+-+V{«(»1-i)}y-1+^" 
2  1  1 

Xa~  4(>y*+-+4{n{n-\)}yn-1  +  4nyn 


n  —  2 


%<n—l     


V{(*-i)(ii-2)K— 
1  i_ 

J{n{n-\)}yn~1  +  4ny 


yn-i+T-yn-  (15) 


4{n(n-\)Yn-1  ■  v« 
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From  the  last  equation  in  (14),  or  adding  the  columns  in  (15), 
we  see  that  the  mean  of  the  x's  is  given  by 

*  =  fa,-  (16) 

Hence  the  variables 

1  y      2  5  *  *  *  y      7i T 

depend  only  on^y,,  ...,JM_i- 
Example  5 

n 

Consider  the  quadratic  form    2  A-    We  have  the  simple 

i=l 

identity 

£#f  ==  H(xi  —  x)2  +  «#2. 

If  we  then  transform  to^'s  by  (14),  in  virtue  of  the  remark  just 
made, 

2*f  =  (quadratic  form  in  ylty2,  ...,yw_i)+^. 

This  is  a  well-known  result  in  theoretical  statistics.  If  n 
independent  variables  x  are  distributed  in  the  normal  form 
with  zero  mean  and  unit  variance,  their  joint  frequency  is 
given  by 

dF  =  (2^)** CXP * ~ *2^ dXl'"  dXn'  ^ 

When  we  transform  to  y's  (the  Jacobian  being  unity)  we  have 

dF cc  exp (quadratic  in  yv  . . . , yn_x) dy1...  dyn_± exp ( - \y2n) dyny 

and  hence  yn,  the  mean  of  #'s,  is  independent  of  2  (xi  —  x)2, 
and  therefore  of  the  variance  of  x. 

Furthermore,  in  virtue  of  the  orthogonality  of  the  trans- 
formation, 

n  n 

2*!  =  2;v? 

i=l  i=l 

and  thus 

n  n  —  1 

2(*<-*)2  =  S??- 

i  =  \  t=l 
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Hence  the  sum  S  {xi  —  x)2  is  distributed  as  the  sum  of  n  —  1 
variables  yl9  ...,3V-i>  which  also  are  independent  with  zero 
mean  and  unit  variance. 

Example  6 

Consider  the  bivariate  form,  similar  to  that  of  the  previous 
example 


n  n  n 

i=l  i=l  i=l 


yLx\-2pY<xiui+  5>| 


=  2  (x  —  x)2  —  2^2  (x  —  x)  {u  —  u)  +  2  {u  —  u)2 

+  n{x2-2pxu  +  u2}.  (18) 

Transforming  by  a  Helmert  transformation  from  x  to  y  and 
from  u  to  ^,  we  see  that  this  is  equal  to  the  sum  of  two  com- 
ponents, a  quadratic  form  in  j^,  ...,3V-!,  vl9  ■-.,vn_1  and  a  quad- 
ratic in  yn  and  ^w  (or  #  and  u). 

Thus,  in  samples  of  n  from  the  bivariate  normal  form 

dF  oc  exp  J  —  7T— : (x2  —  2pxu  +  u2) )  dx  du 

\     2(!-p2)  J 

the  mean  statistics  #,  w  are  distributed  independently  of  a 
quadratic  form  in  x{  —  x  and  i^  —  w. 

Polar  coordinates  in  n  dimensions 

21  In  two  dimensions  we  have  the  familiar  polar  trans- 
formation 

x1  =  rcosd,     x2  =  rsind,         O^r^oo,     0^0^2-77      (19) 

with 

x\  +  x2  =  r2  (20) 

and  a  Jacobian 

cos  6         sin  9 

(21) 


"    3(r,0) 


r  sin  t7     r  cos 
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In  three  dimensions,  in  so-called  spherical  polars,  we  have 
x1  =  r  cos  61  cos  02,     x2  =  r  cos  6X  sin  #2,     x3  =  r  sin  0l9 

O^r^oo,      -177^^^1-77,     O^02^2t7,  (22) 


with 


i=l 


and  (writing  c  for  cos  and  s  for  sin) 


J  = 


<y{xly  x2J  x3) 


c,  c 


lu2 


C-,  s 


1°2 


Y  S-i  Co  »  01  o o        »  £i 

rdj,       r  c-i  Co        0 


1°2 


'1^2 


=  r2cos^.       (23) 


Now  consider 


Xi    —   ¥  C-t  c 


lc2 


Xa    I    C-\    C 


1^2 


Cn-2  cn—\-> 


Cn-2  Sn-1> 


0<^n_1<27T, 


J77  <  0n_2  <  J77, 


X3  —  T  C1C2       •  •  •       ^ri-2> 


-^77^ 


jn-o  <  S77"* 


^w  —  P^i, 


J77^^<l77.  (24) 


Note  that  all  the  angles  go  from  —^77  to  ^77  except  6n_lt 
which  varies  from  0  to  2tt.   It  is  easy  to  see  that 


(25) 
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The  Jacobian  is 


,n— 1 


—  S*iC 


1^2 


C,S 


1°2 


C-,  C 


lc2 


n— 1 


n-1 


?i-l 


>n-l 


—  *i 


Ci  ^<; 


C-,  c 


1^2 


n—Z^n—l 


'n-2  ^w-l 


n-2  "'w-l 


w-1 


_  yn— 1  ^n— 1  ^w— 2 


1                 1 

1 
-*1    • 

1 
•     1/^1 

•  Cn-1  *2  • 

•  ^n-l 

*2               ^2 

~*2    • 

.     0 

—  *w-l    V*n-1 

0        .. 

.     0 

where  t  stands  for  tangent.    Subtracting  each  column  from  the 
preceding  one,  we  find,  with  a  little  manipulation, 


J  =  rn-l cn-2 cn- 


...  c 


n-2' 


(26) 


It  is  readily  verified  that  for  n  =  2  and  n  =  3.  these  results  agree 
with  those  obtained  directly,  e.g.  in  (21)  and  (23). 

Geometrically,  we  may  picture  the  situation  in  this  way: 
consider  the  space  with  an  origin  and  hyperspheres  of  constant 
r  centred  on  it  like  the  layers  of  an  onion.  (The  surface  of 
constant  r  is  then  a  V^_x.)  The  flat  xn  =  constant  corresponds 
to  Si  —  constant  or  6X  =  constant.  The  space  for  which  this  is 
true  is  given  by 

W2^|  =  ^2cos26>1 

which  is  a  Vn_2,  another  hypersphere,  in  the  Sn_x  determined 
by  xn  —  constant.  Similarly  in  this  Sn_1  the  flat  xn_x  =  constant 
corresponds  to  62  =  constant  and  determines  a  hypersphere 
Vn_z  in  an  Sn_2\  and  so  on  with  diminishing  dimension  number 
until  we  reach  a  circle  in  a  plane. 
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Example  7 

Consider  again  samples  of  n  from  a  normal  distribution 
with  zero  mean  and  unit  variance.    The  distribution  becomes 

dF  oc  exp  I  —  1 2  x\  I  dx1 . . .  dxn 

oc  exp  ( -  ir2) r™-1  dr  c?~2  ^"3  . . .  cn_2 d61...  ddn_v      (27) 

Thus  the  distribution  of  r  is  independent  of  the  distribution  of 
the  angles  0l9 ...,  dn_1  and  is  given  by 

dF  oc  exp  ( -  ir2)  r^"1  Jr.  (28) 

This  is  the  familiar  distribution  of  x2> 

The  independence  of  r  and  the  angles  6  has  one  other  far- 
reaching  consequence.  If  we  have  two  algebraic  forms  homo- 
geneous and  of  the  same  degree  in  the  x's,  say  fx  and  /2,  their 
ratio 

t-fjf,  (29) 

is  of  degree  zero  in  x.  Thus,  in  virtue  of  (27),  the  expectation 
of  any  power  of  t  is  independent  of  r.  Consequently,  if /2  is  a 
function  of  r,  the  moments  of  t  (and  hence  the  distribution  of  t) 
are  independent  of  r.  Thus,  for  normal  variation,  with  t  and  f2 
independent, 

E{ffi  =  E{t*f*)  =  E(t*)E{ffi 
and  hence 


E(*k) = ^m>  (3°) 


a  formula  which  enables  us  to  calculate  the  moments  of  the 
ratio  t  from  those  of  its  numerator  and  denominator. 

Equation  of  a  flat  through  given  points 

22     In  n  dimensions  the  general  equation  of  an  (n—  l)-flat  is 

a1x1  +  a2x2+  •••  +anxn  =  k.  (31) 
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If  this  passes  through  n  points  with  coordinates  xu, x2i,  ...yxni, 
i  =  1,2, ...,«,  we  have  n  equations  typified  by 

a1xli  +  a2x2i+...+anxni  =  k,         i=  1,2, ...,«.         (32) 

Eliminating  the  n+  1  constants  av a2l  ...,an,k  from  the  («+  1) 
equations  (31)  and  (32),  we  have  for  the  equation  of  the 
(rc-l)-flat 

1 

1 

1 


li 


21 


"nl 


12 


'22 


"nl 


"w2 


1 


=  0. 


(33) 


If  the  matrix  (x^)  is  not  of  rank  n  the  points  are  not  inde- 
pendent and  the  (n—  l)-flat  is  not  uniquely  determined.  The 
generalization  to  n  dimensions  of  results  which  are  familiar  in 
two  and  three  will  offer  no  difficulty.  For  example,  with  n  =  3, 
if  the  matrix  (x^)  is  of  rank  2  the  three  points  lie  on  a  line  and 
a  single  infinity  of  planes  will  contain  them. 


Angles 

23  Any  two  intersecting  straight  lines,  in  however  many 
dimensions,  determine  a  plane  which  contains  them  both.  The 
angle  between  them  in  this  plane  is  unique.  Likewise,  if  two 
lines  do  not  intersect,  we  can  select  a  point  on  one  and  draw 
through  it  a  parallel  to  the  other.  The  angle  between  the  two 
intersecting  lines  is  independent  of  where  we  choose  the  point 
or  which  line  we  place  it  on,  and  hence  determines  the  (unique) 
angle  between  the  lines. 

Consider  now  a  line  in  n  dimensions  given  by  equation  (1): 


X1  —  x1  __  X2 
h 


X 


L 
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This  goes  through  the  point  (xltx2,  ...,xn).  If  we  draw  a 
parallel  through  the  origin  O  we  shall  have  the  line 

j    —j    -  •••  —    i    -  K^) 

The  constants  lt  are  indeterminate  in  the  sense  that  the  line 
remains  the  same  if  we  multiply  them  all  by  the  same  constant. 
We  may,  therefore,  without  loss  of  generality,  require  them  to 
obey  the  normalizing  condition 

S/f  =  l.  (35) 

Now  consider  a  plane  Xx  —  av  This  will  meet  the  line  in  a 
point  P,  say,  with  coordinates  al9  /2  aJht  h  ailk> etc-  The  distance 
of  this  point  from  the  origin  is  given  by 


Thus 


opw1+(^)\...  +  (^)! 

a2 
=  -^,     in  virtue  of  (35). 


h  ~  OP' 


Now  aJOP  is  the  cosine  of  the  angle  between  the  line  and  the 
Xj-axis.  Similarly  lt  is  the  cosine  of  the  angle  between  the  line 
and  the  X^-axis.  The  constants  l{  are  therefore  known  as 
direction  cosines. 

It  follows  that  the  orthogonal  projection  of  a  point  (xv  ...,  xn) 
on  to  the  line  (34)  is  distant  S/^  xi  from  the  origin.  Its  distance 
from  the  line  is  then  given  by 

=  2/?Z*?-(S/i*j)2 
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24  Consider  now  a  second  line  with  direction  cosines  /^.  The 
projection  of  OP  on  to  this  line  is  OP  cos  cf>  where  <f>  is  the  angle 
between  the  lines.  The  projection  of  the  line  represented  by  / 
on  the  Xi  axis  is  OPlt  and  the  projection  of  OP  on  /'  is  then 
the  sum  of  terms  OP^  l'^   Thus 

cos<£  =  S4JJ.  (36) 

This  is  a  fundamental  formula  which  is  a  direct  extension  of  the 
results  for  two  and  three  dimensions.  The  lines  are  orthogonal 
if  E//'  =  0.   We  may  also  write 

sin2<£  =  1-(S//')2  =  2/22/'2-(2//')2 

-is&z;-^)1,     ij-iX..,*  (37) 

25  Consider  now  an  (w— l)-flat  which,  without  loss  of 
generality,  we  may  suppose  to  go  through  the  origin.  It  has, 
say,  the  equation 

2X^  =  0.  (38) 

i=l 

A  line,  also  through  the  origin,  of  the  form 

£-.,-£-,  (39) 

will  lie  in  the  (n—  l)-flat  if  and  only  if 

Zo^-pZa^O.  (40) 

Thus,  the  line  with  direction  cosines  proportional  to  the  a's 
will  be  orthogonal  to  any  line  in  the  (n—  l)-flat.  This  perpen- 
dicular is  called  the  normal  to  the  (n—  l)-flat.  If  we  write  the 
equation  of  an  arbitrary  (n—  l)-flat  in  the  form 

ZkXi-p  (41) 

the  /'s  are  the  direction  cosines  of  the  normal  andp  is  the  length 
of  the  perpendicular  from  the  origin  on  to  the  (n—  l)-flat. 

21 


26  The  angle  between  two  (n—  l)-flats  may  be  defined  as  the 
angle  between  their  normals.   Thus  the  angle  between 

^kXt=p  (42) 

and 

XllXt  =  p'  (43) 

is  given  by 

arc  cos  (Z/^).  (44) 

27  When  we  come  to  consider  the  angles  between  flats  other 
than  S1  and  Sn_1  in  Sn  we  encounter  a  new  idea  which  is  not 
visualizable  in  two  or  three  dimensions.  In  fact,  two  flats  may 
have  more  than  one  angle  between  them. 

Consider,  for  example,  two  planes  in  *S4.  They  have  one 
point  in  common.  If  we  regard  one  plane  as  fixed  and  the 
common  point  as  fixed  on  it,  the  other  plane  can  vary  about  this 
point  in  a  double  infinity  of  ways;  for  two  additional  points  are 
required  to  fix  it.  By  fixing  one  angle  with  the  first  plane  we 
do  not  determine  the  second  plane  completely.  We  need  two 
conditions,  equivalent  to  two  angles,  to  do  so.  We  may 
therefore  expect  to  find  two  angles  between  the  planes.  Any 
two  angles,  in  a  sense,  would  do.  But  we  shall  adopt  a  criterion 
suggested  by  definition  of  '"the"  angle  between  a  line  and  a 
plane  in  three  dimensions.  In  fact,  that  angle  is  the  minimum 
angle  between  the  given  line  and  an  arbitrary  line  in  the  plane. 
We  shall  adopt  this  minimization  principle. 

28  Consider  then  an  Sn_p  defined  by  p  equations  of  type 

ailx1  +  ai2x2+...ainxn  =  0,         i=  1,2,...,/).  (45) 

Writing  A  for  the  p  x  n  matrix  of  coefficients  and  X  for  the 
column  (n  x  1 )  matrix  of  x's,  we  have 

AX  =  0.  (46) 

We  lose  no  generality  by  supposing  the  Sn_p  to  go  through 
the  origin. 
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If,  then,  L  is  the  column  vector  representing  a  line  in  this 
plane  we  have,  as  at  (40), 

AL  =  0  (47) 

subject  to 

L'L  =  1.  (48) 

Likewise  in  a  second  space  Sn_q  determined  by  q  equations 
with  coefficients  B  we  have 

BM  =  0  (49) 

M'M  =  1,  (50) 

where  M  represents  an  arbitrary  line  with  direction  cosines  m^ 
The  angle  between  a  line  in  one  space  and  a  line  in  the  other 
is  the  angle  (f>  whose  cosine  is 

L'M  =  MX  =  cos</>  =  R,  say.  (51) 

We  require  to  find  stationary  values  of  R  subject  to  the  four 
conditions  (47)-(50).  Take  Lagrange  multipliers  Xl5  X2,  <xlt  a2, 
where  X±  is  a  (1  xp)  vector  and  A2  is  a  (1  x  q)  vector.  Then  we 
require  the  unconditioned  stationary  values  of 

L'M-X1AL-X2BM-o:1(L,L-l)-a2(M,M-l)       (52) 

for  variations  in  L  and  M.  Differentiating  by  lt,  mi  in  turn,  we 
have 

M'-X1A-2a1L'  =  0  (53) 

L'-X2B-2cx2M'  =  0.  (54) 

Postmultiplying  (53)  by  L,  we  have,  in  virtue  of  (47)  and  (48), 

MX  =  2oc1  =  R  (55) 

and  similarly 

L'M  =  2a2  =  R.  (56) 

Hence 

-XiA^RL'-M'  (57) 

-X2B  =  RM'-L'.  (58) 
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Postmultiplying    (57)    by   A'    and    by    B',    we    have,    since 
(L'A')  =  (AL)'  =  0, 

-X^A'  =  -M'A' 


-XiAB7  =  RLB/, 

and  similarly,  postmultiplying  (58)  by  B'  and  by  A', 
-X2BB'  =  -L'B' 
-X2BA'  =  RMA. 

From  (59)  and  (62)  we  then  have 

RX1AA+\2BA'  =  0, 

and  from  (60)  and  (61)  similarly 

X1AB,  +  JRX2BB/ =  0. 

Writing  now 

U  =  AA'  (a  symmetrical pxp  matrix), 

V  =  BA'  (qxp  matrix), 

V  =  (BA')'  =  AB'  (p  x  q  matrix), 

W  =  BB'  (a  symmetrical  q  x  q  matrix), 
we  have 

i?X1U  +  X2V  =  0 

X1V,  +  .RX2W  =  0 

and  eliminating  Xx,  X2,  we  have 


RU'     V 
V    RW 


RU     V 
V    RW 


=  0 


or,  equivalently, 


(*■> 


219-p 


R*U    V 

V       W 


=  0. 


(59) 
(60) 

(61) 
(62) 

(63) 

(64) 

(65) 
(66) 
(67) 
(68) 

(69) 


(70) 


(71) 


This  is  a  (p  +  q)2  determinant  of  degree  p  in  R2.    It  can 
be  solved  for  R2  and  hence  there  are  p  stationary  (minimal) 
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angles  between  the  spaces.  We  may  reduce  (70)  to  a  />-way 
determinant  as  follows: 

In  (69)  multiply  the  first  equation  by  R  and  the  second  by 
W_1V,  and  subtract.   We  find 

(/PU-V'W^V^-O 

and  hence 

|^2u_V'W-iV|  =  0.  (72) 

29  Before  considering  an  example  let  us  pick  up  a  few  loose 
ends.  First,  we  are  assuming  that  U  and  W  have  inverses,  that 
is  to  say,  that  A,  B  are  not  degenerate.  If  they  are,  some  of  the 
relations  determining  Sn_p  and  Sn_q  are  redundant  and  those 
spaces  are  really  of  greater  dimensions.  We  may  suppose  that 
this  point  has  been  examined  before  the  investigation  begins. 

Second,  we  have,  in  effect,  assumed  that  p  <  q.  This  point 
cropped  up  in  proceeding  from  (70)  to  (71).  If  p  >  q  we  merely 
invert  the  roles  of  the  two  spaces.  It  is  the  smaller  of  the 
numbers  pyq  which  determines  the  number  of  non-zero 
minimal  angles  between  them;  in  (71)  q—  p  of  the  values  of 
R2  are  zero. 

Third,  the  lines  corresponding  to  the  minimal  angles  are 
orthogonal  in  their  respective  spaces. 

In  fact,  VW-1V  and  U  are  symmetric  matrices  and  hence 
soisU^V'W^V.   Call  it  K.   We  have 

KXX  =  R2\.  (73) 

If  Ry  S  are  two  different  roots  of  (72),  with  corresponding 
vectors  \lt  p1?  we  also  have 

KPl  =  S2Pl. 
Thus 

x;kPi  =  S2x;Pi. 

In  virtue  of  the  symmetry  of  K,  p[  K\  =  Xx  Kpx  and  hence 
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Thus,  R  and  S  being  different,  the  vectors  p1  and  Xx  are 
orthogonal.  It  then  follows  from  (59)  that  the  corresponding 
M's  are  orthogonal. 

Example  8 

Suppose  we  have  two  planes  in  54  determined  by 

X-t   ~~T~    I  Xn  ~J~  Ao    — -    U  I 


^  =  0 


X-t  ~j~  Xn    —    \J 
Xv  "t~  Xa    — —    U 


(74) 
(75) 


We  have  A  = 


Thus    AA' 


AB 


17     10 
0     0    0     1 

51     0 
0      1 

8     1 
0     1 


B 


BB 


BA' 


110    0 
0     0     11 

2     0 

0  2 

8     0 

1  1 


Equation  (71)  is  then 


SIR- 
0 

8 
1 


0 

8 

1 

R2 

0 

1 

0 

2 

0 

1 

0 

2 

o, 


(76) 


which  reduces  to 

5LR4-58i?2+16  =  0 

(17JR2-8)(3JR2-2)-0. 

R  =  ±  VtV    or     ±  Vi 

These  are  the  cosines  of  the  two  angles  between  the  planes. 
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Equivalently,  from  (72)  we  find,  since 


/ 

65         1 

vw-1 

V 

-( 

2          2 

1  I 

2  2 

S1R2- 

65 

2 

1 
2 

-i 

R< 

!_1 
2 

=  0 


(77) 


leading  to  the  same  result. 


Example  9 

It  follows  from  our  treatment  that  two  ^>-flats  in  S2p  have  p 
angles  between  them.  In  S2p+1  two  ^)-flats  also  have  p  angles. 
Whereas  in  S2p  they  meet  in  a  point,  in  S2p+1  they  do  not.  One 
root  in  R,  as  at  (71),  then  vanishes.  Thus  there  is  a  line,  Slf 
perpendicular  to  both  p-flats,  and  the  distance  between  the 
points  where  it  intersects  them  is  the  "distance"  between  the 
flats.   (Cf.  the  case  of  three  dimensions.) 

Reciprocity 

30  In  classical  geometry  there  is  a  very  useful  reciprocal 
relation  between  points  and  lines  (in  a  plane)  or  between 
points  and  planes  (in  three  dimensions).  In  the  plane,  for 
example,  two  points  determine  a  common  line;  two  lines 
determine  a  common  point.  It  will  be  found  that  the  correspon- 
dence exists  in  the  other  axioms  or  postulates  of  geometry, 
with  the  result  that  we  can  translate  propositions  into  a  dual 
form.  For  example,  if  a  hexagon  is  inscribed  in  a  conic,  the 
meets  of  opposite  pairs  of  sides  are  collinear  (Pascal's  theorem). 
Since  (we  assume  this  for  the  purpose  of  exemplification)  the 
conic  traced  by  a  point  is  dual  to  a  conic  enveloped  by  lines, 
we  have  immediately  the  dual  proposition:  if  a  hexagon  is 
circumscribed  to  a  conic  the  joins  of  opposite  pairs  of  vertices 
are  concurrent  (Brianchon's  theorem).  In  n  dimensions  a 
point  corresponds  to  a  flat  (n  —  1),  an  Sp  to  an  Sn_p+1  and  so  on. 
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Analytically  we  may  consider  the  (n—  l)-flat 

l1X1+...  +  lnXn  =  constant  (78) 

from  two  points  of  view:  given  the  X's,  the  /'s  obey  a  linear 
relation,  given  the  /'s  the  X's  do  so.  The  symmetrical  bilinear 
form  (78)  determines  a  (1, 1)  relation  between  a  family  of  points 
determined  by  the  X's  and  a  family  of  flats  determined  by  the 
Vs.  This  type  of  duality  has  not  been  much  used  in  a  statistical 
context. 

31  There  is,  however,  a  second  and  distinct  kind  of  duality 
which  should  be  noticed.  If  n  points  in  p  dimensions  are  given 
by  the  array 

xn  •  •  •  xln 

xpl'"xpn  \'*) 

we  may,  so  to  speak,  read  the  matrix  downwards  (transpose  it) 
so  as  to  get 

xxl . . .  xpl 


In 


(80) 


We  may  thus  regard  the  array  as  corresponding  to  a  set  of  p 
points  in  n  dimensions,  rather  than  n  points  in  p  dimensions. 
This  kind  of  duality  between  spaces  of  n  and  p  dimensions  will 
be  exemplified  later. 

Solid  angles 

32  In  three  dimensions  we  find  the  notion  of  ' 'solid  angle"  of 
a  cone.  This  is  the  area  which  the  cone  cuts  off  on  the  surface 
of  a  unit  sphere  centred  at  the  apex  of  the  cone.  Likewise,  in 
n  dimensions,  we  can  determine  a  solid  angle  of  a  hypercone 
by  considering  the  content  (volume)  of  a  region  cut  off  on  the 
hypersphere  centred  at  the  apex  of  the  cone.  This  kind  of 
measure,  though  termed  "angle",  is  really  related  to  "content", 
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and  we  shall  discuss  below  the  interpretation  to  be  placed  upon 
that  quantity,  which  corresponds  to  area  in  the  plane  and  volume 
in  three  dimensions. 

Tangent  (n  -  1) -flats 

33  An  equation  such  as 

f(X1,...,Xn)  =  0  (81) 

determines  a  Vn_1  in  Sn.  Let  xv  ...,xn  be  a  point  on  the  Vn_v 
By  Taylor's  theorem  we  have 

f(Xlt  ...,Xn)  =  /(*, xn)+  2  (JDjJXt-xt) 

+  terms  of  higher  order  in  (Xt  —  xt).     (82) 
Thus  at  the  point  x  the  (n—  l)-flat 

kx*-xAwlr»  (83) 

has  first-order  contact  with  the  Vn_1.  It  is  called  the  tangent 
(w-l)-flat. 

It  follows  that  the  line 

X i  —  xx       X2  —  x2  _        _  -X-n  —  xn  *~.* 

(_8f_\      (_df_\      -     i_df\  K   > 

\8XJXI      \dXjXi  \dXj^ 

passes  through  the  point  xlt . . . ,  xn  and  is  perpendicular  to  the 
tangent  (n—  l)-flat  there.  It  is  called  the  normal  to  the  Vn_1  at 
that  point. 

Reduction  of  a  quadric  to  canonical  form 

34  Consider  the  quadric  Vn_x 

XaijXiXj  +  ItbijXi  +  c  =  0.  (85) 

Let  us  make  this  homogeneous  by  introducing  a  new  dummy 
variable    Xn+1    equal    to    unity    and    writing    bin+1  =  ain+li 
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c  —  an+1>n+1.   Then  (85)  may  be  written 

n+l 

S««X<^  =  0  (86) 

it}—  1 

or,  in  matrix  form, 

X'AX  =  0,  (87) 

where   A   is  the   matrix   (a{j)   and   X  is   the   column  vector 
[xlf  x2, ... , xn+i)- 

Let  us  make  an  orthogonal  transformation  to  new  variables  Y 
given  by 

X  =  LY.  (88) 


(89) 


The  form 

(87)  then  becomes 

Y  L'ALY  =  0. 

If  we  now 

can  find  a  diagonal  matrix 

/          A2 

0 

H 

such  that 

L'AL  =  A, 

K+i 

(90) 
our  form  (89)  reduces  to 

Y'AY  =  A, Y*  +  X2Yi  +  ...  Xn+1  Y*+1  =  0.  (91) 

The  form  is  then  said  to  be  canonical. 
For  (90)  to  be  true  we  must  have 

LL'AL  =  LA 

and  since,  by  hypothesis,  LL'  =  1,  we  have 

AL  =  LA.  (92) 

Now  let  1^  be  the  ith.  column  vector  in  L.   We  shall  have,  in 
virtue  of  (92)  and  the  diagonal  character  of  A, 

Ali-A^O  (93) 

30 


and  hence 

\A-\I\-0.  (94) 

The  equation  \A  —  XI \  =  0  is  of  degree  n+\  in  A.  It  is  clear 
that  its  roots  are  the  n+\  values  of  A  in  A,  for  (93)  is  true  of 
any  i.  Thus,  if  we  can  solve  (94)  we  can  find  lt  from  (93) 
together  with  the  orthogonality  condition  /'/  =  1 ;  hence  we  can 
find  L,  which  gives  the  required  transformation  to  the  canonical 
form  (91).    It  will,  in  general,  be  unique. 

35     The  following  points  are  to  be  noted. 

(a)  In  practice  it  is  normally  more  convenient  to  remove 

the  linear  terms  from  (85)  by  a  transformation  of  the 

origin  before  reducing  to  canonical  form,  which  then 

becomes  n 

2  A;Y/  =  constant.  (95) 

i=i 

The  constant  on  the  right  is  almost  always  positive  in 
statistical  applications. 

(b)  We  quote  without  proof  a  result  from  matrix  theory 
to  the  effect  that  if  A  is  a  symmetric  real  matrix  (which 
is  so  in  our  case  since  we  may  take  ars  =  asr  without 
loss  of  generality)  then  all  the  roots  in  A  are  real. 
Accordingly  the  transformation  is  real. 

(c)  If  the  form  X'AX  is  positive  definite  then  all  the  roots 
A  are  positive;  for  otherwise  the  canonical  form  (91) 
could  be  negative  or  zero.  This  case  is  of  particular 
interest  in  statistics. 

(d)  In  degenerate  cases  some  A's  may  be  zero.  The  form 
(91)  then  contains  fewer  than  n  variables,  and  X'AX 
is  a  variety  of  lower  dimension. 

(e)  If  certain  A's  are  equal  the  transformation  is  not  unique. 
If,  for  example,  A^-  and  Xt  are  equal,  lt  and  lj  become 
indeterminate  unless  a  further  condition  is  imposed. 
The  transformation  then  has  a  single  infinity  of  freedom. 
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36  From  the  purely  geometrical  viewpoint  the  transformation 
we  are  here  considering  is  the  analogue  of  transformation  to  the 
principal  axes  of  a  conic  in  two  dimensions  or  a  quadric  in 
three  dimensions.  Equation  (95)  may  be  regarded  as  a  hyper- 
ellipsoid  if  all  the  A's  are  positive  and  a  hyper-hyperboloid  if 
some  are  negative.  The  case  when  certain  A's  are  equal 
corresponds  to  circular  symmetry. 

37  The  transformation  we  have  so  far  considered  is  a 
"rotation"  of  the  coordinate  axes.  If  we  care  to  make  a  further 
transformation  of  scale  on  each  axis  of  the  type 

Zi  =  Y^\  (96) 

equation  (95)  becomes  simply 

n 

2  Zf  =  constant, 

corresponding  to  a  hypersphere.  This  is  the  analogue  of  the 
theorem  of  two-dimensional  geometry  that  a  conic  can  be 
projected  into  a  circle. 

It  follows,  since  (96)  is  invariant  under  any  further  rotation 
of  the  axes,  that  if  we  have  two  quadratic  forms 


2  an  Xi  Xj  —  Qx 

n 

2  Kxixi  =  Q2 


(97) 


there  exists  a  linear  transformation  of  the  coordinates  which  will 
reduce  them  simultaneously  to  canonical  form.  Suppose  that 
the  transformation  is 

X  =  CY. 
The  two  forms  become 


ycacy  =  q; 

Y'C'BCY  =  Q2 
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(98) 


If 

C'AC  =  X,     a  diagonal  matrix) 

,  (99) 

C'BC  =  fx,     a  diagonal  matrixj 

the  forms  become 

Y'XY  =  Q1 


(100) 

Now  p.-1  X  is  also  diagonal,  say  v,  and  is  equal,  from  (99),  to 

(C'BC)-1  (C'AC)  =  C^B^C^CAC 

=  C"1B-1AC.  (101) 

Writing  D  for  B_1  A  we  then  have 

C1DC  =  v  (102) 

DC  =  Cv 

(D-v)C  =  0,  (103) 

which  is  of  the  same  form  as  (92).  Thus,  given  A,  B  (and 
hence  D)  we  find  the  roots  in  v  of 

IB^A-vI^O  (104) 

and  this  gives  us  the  required  transformation.  Equation  (104) 
is  equivalent  to 

\A-vB\  =  0.  (105) 

We  may,  in  fact,  regard  (94)  as  a  particular  case  of  (105)  with 
B  =  I. 

38  The  manual  solution  of  equations  like  (92)  and  (104)  for 
more  than  two  dimensions  is  a  somewhat  tedious  process  which 
is  best  carried  out  by  an  iterative  procedure,  for  the  details  of 
which  see  Kendall  (1957).  Most  electronic  computers  may  be 
programmed  to  solve  them  and  at  the  same  time  give  the 
corresponding  transformation. 
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The  projection  of  an  angle 

39  Consider  a  line  in  Sn  through  the  origin  0  and  a  point  P  a 
unit  distance  from  O.  If  the  direction  cosines  of  the  line  are 
lly  ...,/n,  the  coordinates  of  P  are  also  llt  ...,ln.  If  we  project 
the  line  on  to,  say,  P',  in  the  plane  Xn  =  0,  the  coordinates  of 
P'  are  l^  ...,/TC_i  and  the  line  OP'  has  direction  cosines 
proportional  to  lly . . . ,  ln-.±.  Similarly  for  a  line  with  direction 
cosines  typified  by  l^. 

The  angle  between  the  original  lines  is  2/^.   That  between 
the  projected  lines  is 

n-l 

(106) 


«i-©(i  -w 

the  denominator  arising  from  the  fact  that  the  sum  of  squares 
of  a  set  of  direction  cosines  is  unity.   We  may  write  (106)  as 

n 

(107) 


{(i-/*)(i-/;2)F 

Now  angles  are  invariant  under  rotation  of  axes.  Thus,  if  two 
lines  make  an  angle  6  and  we  project  them  on  to  an  arbitrary 
plane  whose  normal  makes  angles  a,j8  with  them,  the  angle 
between  the  projections,  say  </>,  is  given  by 

cos  6  —  cos  ol  cos  B  MAm 

C0S^  -  {(l-C08«a)(l-COsW  (108) 

This  formula,  in  another  guise,  is  familiar  in  statistics  as  the 
relation  between  a  partial  correlation  and  total  correlations. 

Content 

40  By  definition  the  content  (w-dimensional  volume)  of  a 
closed  Vn_x  is  the  ra-fold  integral 


/••■J 


dxx ...  dXj 
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taken  over  the  "inside"  of  the  hypersurface.  It  will  be  evident 
that  the  content  of  a  hypercube  of  side  length  a  is  an.  A 
rectangular  parallelotope  of  sides  ali...,an  has  content 
a1a2...  an.  We  proceed  to  find  the  content  of  some  of  the 
simpler  Vn_^s. 

Content  of  a  hypersphere 

41  Let  the  hypersphere  of  radius  a  be  determined  by 

Make  a  polar  transformation  of  the  type  considered  in  section 
21.   The  integral  giving  the  content  becomes 

Fdr  f  d61  f  *"  dd2...  ("dd^  f?  cr2  cr3  ■  ■  ■  cn-2.  (109) 

J0        J-kn         J -in  JO 

All  the  variables  are  independent.  We  have,  putting  cos2  6  =  u, 
J-l*  Jo  2  cos  6  sin0 

=  £(i-«)-*«*«-«  =  £{ii(y+i)} 
rft)rftp-+i)} 

mo+2)}  • 

Substituting  such  results  in  (109)  we  find  for  the  content  C 
^r(i)T{i(n-l)}         r{j(n-2)}      r(i)T(l) 

^~  n'     ran)       lwrft(»-i)}-    r(|)    -Z7r 

=  —  F7TV  (110) 

For  example,  with  n  —  2  we  have  777Z2,  the  area  of  a  circle; 
with  n  =  3  we  have  477yz3/3,  the  volume  of  a  sphere;  and  so  on. 

42  The  analogue  of  the  "surface  area"  of  a  sphere  is  derivable 
immediately.    If  we  differentiate  the  content  of  a  hypersphere 
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with  respect  to  a  we  obtain  the  surface  content.  Thus  the 
surface  content  is  given  by 

2an~1  TT%n 

w-  (111) 

Thus,  for  «  =  2we  have  2nay  the  circumference  of  a  circle ;  for 
n  =  3  we  have  47r<22,  the  surface  of  a  sphere ;  and  so  on. 

Content  of  a  hyperellipsoid 

43  Content  being  independent  of  coordinate  axes,  we  may, 
without  loss  of  generality,  suppose  the  hyperellipsoid  put  in 
the  form 

^  +  ^f+...+^=l.  (112) 

of      a\  a% 

The  lengths  alt  ...,an  are  the  semi-axes  of  the  hyperellipsoid. 
A  volume  integral  over  a  region  determined  by  (112)  is  seen,  by 
the  transformation  Xi  =  aiYi>  to  be  equal  to  aly...,an  times 
the  content  of  a  unit  sphere.    Thus  our  required  content  is 

2a!...  an    tt** 


E(W 


(113) 


Content  of  a  hyperprism 

44  A  hyperprism  is  the  figure  generated  by  a  flat  region  of 
content  C  in  Sn_1  moving  parallel  to  itself  along  a  straight  line 
which  makes  an  angle  6,  say,  with  the  Sn_1.  It  scarcely  needs 
proof  that  if  the  distance  moved  is  h  the  content  of  the  hyper- 
prism is  Ch  sin  6. 

In  particular  if  the  flat  region  is  a  hypersphere  and  the  line  is 
perpendicular  to  it  we  have  the  w-dimensional  analogue  of  a 
cylinder.    The  content  is  then  Iha71-1  tt^-v JT  {Hn  -  1)}. 

Content  of  a  parallelotope 

45  Let  xijyi,j  =  1,2,  ...,w  be  n  points  Px,  ...,Pn  in  an  Sn. 
Consider  them  together  with  the  origin  O.    Join  O  to  each  P 
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and  complete  the  parallelotope  of  which  O  is  one  corner,  by 
drawing  lines  through  Px  parallel  to  OP2, ...,  OPnJ  etc.  Let  us 
find  the  content  of  this  parallelotope. 

Make  a  variate  transformation  (linear  but  not  orthogonal) 


X  =  AY. 


(114) 


The  Jacobian  of  the  transformation  is  simply  |  A  |.  Now  let  the 
new  variables  Y  be  orthogonal  and  the  points  P  transform  to 
points  with  unit  distance  from  the  origin.  The  content  of  the 
parallelotope  is  then  simply  \A\  and  we  have  only  to  express 
this  in  terms  of  the  coordinates  of  the  P's.  But  from  (114) 
itself  we  have,  for  unit  values  of  the  Y's,  Y  then  becoming  the 
identity  matrix,  X  =  A.   Hence  our  content  is  simply  \X\, 

If  now  our  parallelotope  is  defined  by  (n+1)  points, 
excluding  the  origin,  say  xijt  i  =  1 , 2, . . . ,  n,  j  =  1 , 2, . . . ,  n  +  1 ,  we 
may  translate  the  origin  to  one  of  them,  say  xin+l9  and  the 


coordinates  of  the  others  are  then  x^  —  x, 
then 


1,71+1  • 


The  content  is 


%n+l 


which  we  may  write  in  the  symmetrical  form 


C  = 


li 


x, 


21 


'n\ 


12 


22 


vn1 


xln       ^l,n+l 


Xt 


x. 


2,71+1 


1,71+1 


1  1 


1 


1 


(115) 


Content  of  a  hyperpyramid 

46  A  hyperpyramid  is  the  figure  generated  by  joining  the 
vertices  of  a  polytope  in  Sn_1  to  a  point  not  in  that  space 
(called  the  apex). 

Take  a  set  of  coordinates  in  Sn_t  and  let  the  remaining 
coordinate  be  measured  along  a  line  through  the  apex 
perpendicular  to  the  Sn_v    A  hyperplane  parallel  to  the  Sw_1 
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cuts  the  pyramid  in  a  polytope  which  is  "similar' '  to  the  base 
polytope.  Let  the  content  of  the  base  be  C,  and  the  height  of 
the  pyramid  (the  non-zero  coordinate  of  the  vertex)  be  h.  A 
hyperplane  at  xn  parallel  to  the  base  then  cuts  the  pyramid  in  a 
parallelotope  of  content  Cx™~1jhn~1.  The  content  of  the  pyramid 
is  then 

Content  of  a  simplex 

47  Consider  the  simplex  as  a  pyramid  with  one  point,  say  Pn, 
as  vertex.  If  Cn_1  is  the  content  of  the  base,  we  have  from  the 
foregoing  result  for  Cn,  the  content  of  the  simplex. 

Let  the  vertices  of  the  simplex  be  %,f,/  =  1,2,  ...,w 
together  with  the  origin.  Make  a  transformation  as  in  section 
45.  The  content  of  the  parallelotope  defined  by  these  points 
is  \X\,  the  vertices  of  the  simplex  transforming  into  the  origin 
and  the  points  at  unit  distance  from  it  along  the  coordinate  axes. 
Consider  the  content  of  this  transformed  simplex,  say  Dn. 
Regarding  it  as  a  pyramid  with  one  vertex  at  Pn  we  have, 
from  (116), 


D„=-D 


n— 1> 


and  by  repetition  of  the  process 


-h- 


n\ 


Thus  the  content  of  the  original  simplex  is  \\n\  times  the 
corresponding  parallelotope.  If  it  is  defined  by  (n+l)  points, 
excluding  the  origin,  the  content  is 


i-,C  (118) 


where  C  is  given  by  (115). 
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48  Formula  (118)  gives  the  content  of  the  simplex  in  terms 
of  the  coordinates  of  its  vertices.  We  may  also  derive  an 
expression  in  terms  of  the  lengths  of  its  S^-edges.  Writing  D 
for  the  content  we  have 


n\D  = 


-11 


'21 


'Til 


12 


22 


"n2 


1  1 


**  1,71+1 
#2,71+1 

Xn,n+1 
1 


which  we  may  write  as 
1 


(-l)n2nn\D  = 


n 

S  xil 
i=l 


0     -2x 
0     -2x 


ii 


2j  xi2 

Zxio 


n 

2-i  xi,n+l 
i=l 

-2xx 


,71+1 


21 


—  2x 


22 


—  2x. 


2,w+l 


0     -2xnl 
0         1 


2#. 


712 


2#. 


71,71+1 


1 


0 

0 

0 

0 

l 

1 

#11 

#21 

xnl 

£*!i 

1 

x12 

#22 

xn2 

Sxf2 

1 

#1,71+1 

#2,71+1 

#71,71+1 

^#i,»+l 

(119) 


or  equivalently 


(-lfn+1n\D  = 


(120) 
Postmultiply  (120)  by  (119).   In  virtue  of  relations  such  as 

~Zx%  -  2Zxtj  xik  +  Zx}k  =  2  (x{j  -  xikf  =  a%,    j  #  k] 

=  0,      j  =  *J 

(121) 
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where  ajk  is  the  length  of  the  edge  determined  by  Xj  and  xk, 
we  find 

1 


(-l)»+i2w(«!)2Da  = 


0 


1 


-*12 

0 


1 


%n+l 


l2,n+l 


0 


(122) 


an+l,l       an+l,2 

For  the  regular  simplex  all  the  atj  are  equal,  say  to  a.  Substitut- 
ing in  (122)  and  subtracting  the  last  column  from  the  others 
except  the  first  we  find 

0       0  0        ...      1 


(-l)»+i2»(w!)2JDa  = 


-a*       0 
0        -a* 


giving 


-1 

0 

0 

-1 

=  {-\)n+1a2n 

. 

. 

0 

0 

1 

1 

=  (-\)n+1(n+\)a2n 

»-Sfc 

(123) 


For  example  with  n  =  3  we  have  for  the  volume  of  a  regular 
tetrahedron  <z3/6^/2,  a  result  which  is  easy  to  verify  directly. 

49  We  shall  not  be  concerned  with  the  differential  geometry 
of  n  dimensions  to  any  considerable  extent,  but  two  points  of 
interest  may  be  remarked  upon  before  we  proceed  to  statistical 
applications. 


40 


(a)  In  transforming  from  one  coordinate  system  to  another, 
say  from  x's  to  y's,  the  content  of  an  elementary  volume 
dx1...dxn  transforms  to  Jdy1...dyn  where  J  is  the  Jacobian 
d(x1,x2,...yxn)ld(y1,y2,...,yn).  We  have  used  this  result 
repeatedly  in  the  foregoing  and  it  needed  no  ad  hoc  geometrical 
proof,  being  a  result  in  analysis  which  is  derived  in  most 
textbooks  on  the  real  variable.  It  may,  however,  be  regarded 
from  a  geometrical  viewpoint.  If  we  consider  a  small  displace- 
ment from  xi  to  xi  +  dxii  the  corresponding  point  in  the 
jy-domain  has  coordinates 

n  fix. 

The  content  of  the  parallelotope  corresponding  to  dx1 . . .  dxn 
is  then,  after  the  manner  of  section  45,  equal  to 


Ji+S^r 


By,  J3 


dxi 


dy1...dyn  =  Jdy1...dyn. 


dyi 

(b)  If  we  make  a  coordinate  transformation  from  x  to  y, 
the  lines  yi  =  constant  trace  out  a  coordinate  mesh  in  the 
x  domain  and  vice  versa.  The  normal  to  the  hypersurface 
(Ki-i)  given  by  yi  =  constant  has  direction  cosines  (cf.  section 
33)  proportional  to 

vX  • 

fy,'    i=1'2' -•"• 

In  particular,  consider  the  polar  transformation  of  section  21. 
We  have 


gr  C1CZ"'  Cn-2  Cn-1  O0     —        TS1C2...  Cn_2  Cn_1 


dxx  _  dxx 

-fr~ClC2~-Cn-2Cn-l  -fig 

VXn  OXa 

-^  =  cxc2...  cn_2 sn_±  -QQ-  =  —rs1c2...  cn_2 sn_1 


3r  ~h  ~M~  rCi 
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Hence 

-?  ~l^  =  ~~  rClh  ^  ~'cn-i  +  4  ••■  4-i  +  etc-}  +  ^*ias  ° 

and  hence  the  hypersurfaces  r  =  constant  and  0X  =  constant 
are  orthogonal.  Likewise  it  will  be  found  that  r  =  constant  is 
orthogonal  to  every  6  =  constant  and  that  the  latter  are 
orthogonal  among  themselves.  It  is  easy  to  verify  this  situation 
in  two  and  three  dimensions. 
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PART  2 

STATISTICAL  APPLICATIONS 

Introduction 

50  The  normal  distribution  with  mean  jjl  and  variance  a2  is 
given  by  the  density  (frequency)  function 

dF=^m^[-T^)dx-       <124> 

The  distribution  of  a  sample  of  n  independent  values  from 
such  a  population  is  then  given  by  the  joint  frequency  element 

1  f       1     n  ) 

dF  =  o"(2ir)**  CXP  ( "  2^  *?!  ^  "  ^  j  dXl ' ' '  dXn'     (125) 

Given  a  statistic  ^  =  /  (xl9  x2i . . . ,  ^w)  we  can  find  its  distribution 
function  by  integrating  (125)  over  a  region  for  which  t^t0>  a 
given  value  of  t.  The  boundary  of  this  region  is  the  Vn_1  in  the 
Sn  space  of  (xl9 x2i ..., xn)  given  by  t  =  t0.   If  we  regard 

^H-iA*")2)  (126) 

as  a  "density"  in  the  5W,  our  problem  is  then  to  find  the 
"weight"  in  the  region  bounded  by  t  =  t0.  Equivalently, 
we  may  seek  the  element  of  weight  in  the  range  dt,  which 
gives  us  the  frequency,  as  distinct  from  the  distribution, 
function  of  t.  It  is  to  this  problem  that  we  proceed  to  apply 
our  ^-dimensional  geometry. 
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First  of  all,  by  a  simple  transformation  xi  —  \l  =  yi  we  may 
write  the  density  without  the  term  in  /x.  This  will  modify  the 
boundary  t0.    Our  density  function  is  then 

^^H~i|/)  =  ^(i)Sexp(-fi)  <127) 

where 

r2=  iyl 

Our  surfaces  of  constant  density  are  then  hyperspheres.  It 
is  to  this  spherical  symmetry  that  the  relative  simplicity  of  our 
results  is  due. 

We  have  already  observed  in  Example  7  that  when  cr  =  1  a 
polar  transformation  gives  us  for  the  distribution  of  r2 

dFocexpi-ir2)^-1^.  (128) 

It  follows  at  once  that  for  non-unit  a  the  distribution  is 

rn-l  /       I    r2\ 

rfi?0CL_eXp(-i^rfr.  (129) 

We  also  remarked  in  Examples  5  and  6  that  a  variate  trans- 
formation could  be  made  under  which  one  new  variable  was 
x  and  this  was  independent  of  the  others.  It  follows  that  the 
sum  £(#  —  xf  is  distributed  in  the  form  (129)  with  one  lower 
dimension  number,  i.e.  with  «-2  as  the  power  of  r  instead 
of  n  —  1 . 

It  also  follows  that  if  the  #'s  are  subject  to  a  homogeneous 
linear  restriction 

a1x1+...+anxn  =  0  (130) 

the  variation  lies  in  the  *S'n_1  determined  by  (130)  and  the 
surfaces  of  constant  density  are  hyperspheres  V2_2.  Thus  the 
distribution  of 


n 
H 


2*i 

=1 
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is  of  the  form  (129)  except,  again,  that  n—  1  is  to  be  replaced 
by  n  —  2. 

Consider  now  a  positive  definite  quadratic  form  in  the  x's 

Q=S  aiiXixt.  (131) 

We  can  transform  to  new  variables  y  such  that 

Q=i\yl  (132) 

At  the  same  time  the  distribution  is  transformed  into 

dF  =  -^^n™v(-w\yi)dy1..-dyn.  (133) 

The  surfaces  Q  =  constant  and  the  hyperspheres  of  constant 
density  no  longer  intersect  in  flats  or  hyperspheres  and 
consequently  no  simple  result  is  derivable  for  the  distribution 
of  Q.  We  may,  however,  use  (132)  and  (133)  to  determine  the 
moments  of  Q  very  simply.    For  example, 

E(Q)  =  ^\y\dF=o*Z\, 

E(Q*)  =  fpXttffdF  =  3o*EAJ  +  o«SA(\ 

=  a*{(2A^  +  2SAl}, 
and  so  on. 

51     Consider  now  the  coefficient  rk  defined  by 

n—k 

Zu  xixi+k        _. 

rk  =  *-\  "  (134) 

By  the  argument  of  Example  5  we  see  that  the  numerator  and 
denominator  of  (134)  are  independent  if  the  x's  are  normal 
and  independently  distributed. 

45 


We  can  now  transform  to  new  variables  yly  ...,yn  such  that 
the  distribution  remains  as  at  (133)  and 


Yv.  = 


n 
i=l 


n 
n  —  k 


(135) 


where  the  A's  are  the  roots  of  the  equation  (derived  from  (94)) 


-A 
0 
0 


0 


0 

0 

-A 

0 
0 


0        0 


0      0 
-A    0 

0       0 


0.  (136) 


Again  we  cannot  find  a  simple  expression  for  the  distribution  of 
r.  However,  the  moments  of  the  denominator  in  (135)  are 
ascertainable  from  the  distribution  (129)  and  those  of  the 
numerator  in  the  manner  of  section  49.  Use  of  Example  5 
then  gives  us  the  moments  of  rk. 

This  coefficient  is  known  as  the  "serial  correlation  coefficient", 
in  this  case  with  lag  k  and  no  parental  correlation  between 
successive  values  of  the  a?'s.  Slight  variations  of  the  form  of 
definition  (134)  are  encountered  in  time-series  analysis.  For 
example,  if  we  write  x±  =  xn+1, x2  =  xn+2,  ...,xk  =  xn+ki  the 
so-called  "circular"  coefficient  rc  of  order  k  may  be  defined  by 


ck 


n 

Zj  xi  xi+k 

i=l 

n 


(137) 
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The  usefulness  of  this  result  in  ascertaining  the  distribution  of 
rc  is  that  the  determinant  (136)  then  becomes  a  circulant  and 
the  roots  in  A  can  be  obtained  explicitly. 

"Student's"  t 

52     The  ratio 

(x-ij)4{n{n-l)} 

l~     {s  (*-*)«}*  (138) 

is  known  as  "Student's"  t.  By  the  foregoing  arguments  we  see 
that  we  can  transform  to  the  mean  (/z, ...,  fi)  without  loss  of 
generality;  and  that  the  denominator  is  independent  of  the 
numerator  in  normal  variation. 

If  O  is  the  origin,  P  the  sample  point  and  Q  the  orthogonal 
projection  of  P  on  to  the  line 

Xt  =  X2  =  ...  =  Xn  (139) 

the  length  OQ  is  HxJJn  =  x^n  and  OP  is  (£#?)*.   Thus 

PQ2  =  I,x2-nx2  =  X(Xi-x)2.  (140) 

It  follows  that  t  is  constant  over  the  surface  OQ/PQ  =  constant, 
i.e.  over  the  "circular"  cone  subtending  a  fixed  half-angle  at  the 
origin.  Let  this  angle  be  <f>  and  consider  the  weight  between 
two  cones  given  by  <f>  and  <j>  +  d<f>.  This  will  be  proportional 
to  the  annulus  cut  off  on  a  fixed  hypersphere  (say,  the  unit 
sphere)  with  width  dcj>  and  radius  sinn~2(f>.  We  thus  have  for 
the  distribution  of  the  angle  cf> 

dF  oc  sinn~2  </>  d<f>,     0  ^  </>  ^  tr. 

The  distribution  of 

t  =  2feM  =  ogfci)  =  (n_1)W 

Jli(x  —  x)2  PQ  v        '        T 

is  then  given  by 

/         t2  Ww 
dFoc\\  + t\       dt,  -OO^t^OD 
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and  the  constant  is  easily  evaluated  to  give 

1  l\  ^      Vin  dt  /1,1N 

dF  =  B{i{n-l)>i}(1+^l)      5^1?"  (H1) 

The  mean  in  rectangular  variation 

53  We  consider  now  a  variable  distributed  over  an  interval 
with  uniform  frequency.  Taking  the  interval  to  be  unity, 
without  loss  of  generality  we  may  write  the  distribution  as 

dF=dx,         0^#<1.  (142) 

In  a  sample  of  n  values  the  density  will  be  unity  over  the 
interior  of  a  hypercube  0  <  xi  ^  1  and  zero  outside  it.  This 
discontinuity  in  density  at  the  faces  of  the  hypercube  gives 
rise  to  some  difficulty.  The  frequency  function  of  the  mean 
x  =  constant  is  given  by  the  weight  between  the  hyperplanes 
x  and  x  +  dx.  The  general  hyperplane  x  =  constant  meets  the 
boundary  of  the  hypercube  in  a  region  bounded  by  flats  but 
changing  its  shape  as  x  increases. 

We  deal  with  this  problem  by  introducing  *  'marker" 
variables.  The  apexes  (corners)  of  the  hypercube  may  have 
0  or  1  or  2  or  ...  n  coordinates  equal  to  unity  and  the  rest  zero. 
The  bounding  hyperplanes  of  the  cube,  if  extending  indefinitely 
over  positive  values  of  the  x's,  define  orthants  (analogous  to 
the  quadrant  of  two  dimensions).  Consider  the  orthants 
defined  by  x^r^  where  r?.  may  be  0  or  1.  These  in  total 
contain  all  the  apexes  of  the  cube  and  each  apex  is  the  corner 
of  an  orthant.  Divide  them  into  (ra+1)  sets  according  as  the 
corner  contains  0  or  1  or  2  or  ...  n  coordinates  equal  to  unity, 
i.e.  as 

n 

lirj  =  r  =  0, 1,...,». 

Let  Qi  be  the  number  in  the  z'th  set. 
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For  example,  in  three  dimensions 

Q0  =  1,     Q±  =  3,     Q2  =  3,     £)3  =  1,     totalling  8  corners, 

and  in  general  Qp  is  the  coefficient  of  tv  in  the  binomial 
(l  +  t)n.  (Cf.  sections  16  and  17.)  There  are  2n  corners  and 
correspondingly  2n  orthants. 

We  now  assign  a  density  to  each  orthant  over  its  whole 
domain  from  0  or  1  to  infinity.  To  each  Qp  we  assign  the 
density  (  —  l)p.  That  this  may  be  negative  need  not  worry  us. 
Now  let  P  be  a  point  (with  non-negative  x's)  such  that  s  of  its 
coordinates  are  greater  than  unity.     It  will  then  belong  to 

(  S  J  of  the  QpS.   Now  for  s  ^  1 

Hence  the  total  density  is  zero  everywhere  so  long  as  s^l, 
i.e.  the  point  lies  outside  the  cube.  If  it  lies  on  or  inside  the 
cube  the  density  is  unity. 

Now  let  the  segment  of  the  hyperplane  2#  =  z  lying  in  QQ 
have  content  Cn  (z).  If  this  hyperplane  also  lies  in  some  other 
quadrant,  say  a  Qri  with  r  unit  coordinates,  the  content  in 
Qr  is  Cn{z  —  r).  (If  z^r  this  is  zero.)  Hence  the  total  weight 
we  require  is 

r|( -1)^)  Cn(*-r),  (144) 

where  k  is  the  greatest  integer  less  than  z.  It  remains  to 
find  Cn{z). 

Now  Cn  (z)  lies  in  the  hyperplane  E#  =  constant  which 
meets  the  coordinate  axes  in  a  point  with  coordinate  z.  The 
content  of  the  pyramid  determined  by  the  coordinate  planes 
and  E*  =  z  is  (cf.  (118))  zn\n\.  The  content  Cn(z)  of  the  face 
of  the  pyramid  not  lying  in  the  coordinate  planes  is  obtained 
by  differentiating  with  respect  to  z,  i.e.  is  ^w_1/(«—  1)!.    Thus 

4  49 


we  have  for  the  frequency  of  z  in  the  class  of  orthants  Qr 

^1yTr|)(-l)rp)(*-'r-1,        *Or<*  +  l.     (145) 
For  the  mean  x  =  z\n  we  have  by  a  simple  substitution 

dF  =  -. — 2(-lW    )\x —        **>  <x< •    (146) 

The  correlation  coefficient  in  bivariate  normal  variation 

54     The  bivariate  normal  distribution  with  zero  means  and 
unit  variances  is  given  by 

dF  =  2tt (1  - P*f  exp  ( ~ 2 (1  - p2) ^ ~ 2pXy  +y^}  dxdy'    ^147^ 

We  are  interested  in  the  distribution  of  the  correlation  coeffi- 
cient, in  a  sample  of  n  values,  defined  by 

n 

(148) 


{^{x.-xf^y.-yW 

This  statistic  is  independent  of  the  origin  and  scale  of  x  and  y, 
so  we  have  lost  no  generality  in  using  the  form  (147). 

In  Example  6  we  saw  that  by  performing  a  Helmert  trans- 
formation on  each  of  x  and  y  we  could  reduce  the  joint  frequency 
of  a  sample  of  n  to  the  form 

l  \  /n-1  n-1  n-1     \\ 

£exp     "2(1-     2)(  5  U^~2P  ?  UiVi+   .?  Vi))  dul'"  dUn-* 

Xdv1...  dvn_t  exp  I  -  2(1~2J  (Un  ~  2PUn  Vn  +  *£)j  ^n  <&n> 

(149) 
where  un  =  x,  vn  =  y  and  the  other  u's  are  independent  of 
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un  and  z;n 


In  this  coordinate  system  we  have 

w-l 

r  =  -r^r T~n-  (150) 

(n—  1      n— 1      ^i 

2  «f  S  *>! 

1  1 

We  can  accordingly  consider  the  distribution  of  r  given  by  (150) 
in  the  distribution  given  by  the  first  factor  on  the  right  in  (149). 
Effectively  we  have  removed  the  variation  of  means  x  and  y 
from  the  picture  at  the  expense  of  going  down  into  one  lower 
dimension. 

We  will  find  the  joint  distribution  of  r  and  of  s1  and  s2  given 
by 

4-j^tI1*   *-iVf*  (151) 

The  frequency  function,  from  (149),  is  immediately  written 
down  as 

kxp{"2a-p2)^"2/)n^2+^  [  ^152^ 

Our  only  problem  is  to  find  how  the  differential  element 
dux  . . .  dun_1  dvx  . . .  dvn_x  is  transformed  in  terms  of  slt  s2 
and  r. 

Consider  two  Sn_lt  one  for  u  and  one  for  v.  We  can  picture 
these  as  superposed  one  on  the  other.  (n—\)s\  then  represents 
the  distance  of  the  sample  point  P  (corresponding  to  u)  from 
the  origin  O,  (n—\)s\  that  of  Q  (corresponding  to  v)  from  O. 
The  coefficient  r  of  (150)  is  then  seen  to  be  equal  to  the  cosine 
of  the  angle  between  OP  and  OQ. 

For  an  increase  dsx  the  content  of  the  hyperspherical  shell 
between  s±  and  s±  +  dsx  is,  since  our  surface  is  in  n  —  1  dimen- 
sions, proportional  to  s%~2  dsv  Now  the  angle  arc  cos  r,  say  6, 
is  independent  of  s±  and  s2.  Moreover  (section  49)  the  variation 
of  6  is  orthogonal  to  that  of  s±  and  s2;  the  content  corresponding 
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to  increases  ds1drds2  is  then  the  product  of  the  increases 
corresponding  to  each  separately,  i.e.  to  s%~2s%~2  ds1ds2  times 
the  increase  due  to  variation  dr.  For  fixed  OP,  the  vector  OQ 
which  makes  an  angle  0  with  it  varies  in  the  ^-space  over  an 
annulus  of  radius  s2^{n  —  l)sin  6,  and  the  contribution  is 
therefore  proportional  to 

sin"-3  BdB  =  sin""4  6  dcos  9  =  (1  -  r2f(n~^  dr. 

Hence  our  total  differential  element  is  proportional  to 

^-2^-2(l-r2)^-4).  (153) 

This,  multiplied  by  the  frequency  of  (152),  gives  us  the  joint 
distribution  of  slt  s2  and  r.  To  find  the  distribution  of  r  alone 
we  have  to  integrate  out  st  and  s2.  For  details  the  reader  is 
referred  to  Kendall  and  Stuart,  1957,  Vol.  1,  p.  384. 

Wishart's  distribution 

55     The    procedure    we    have    just    followed    for    two-way 

variation,  namely  of  finding  the  joint  distribution  of  sample 

variances  and  covariances,  may  be  extended  to  p-way  variation. 

We  consider  a  sample  of  n  of  a  ^-variate  complex,  arrayed  by 


xll 

xlp 

#21 

X2p 

Xnl 

xnp. 

(154) 

The  multivariate  normal  distribution  of  p  variates  xv  . . . ,  xp 
is  given  by 

dF = (2^exp H (MxAii xtx)\ d*i-dx»  (i55) 

=  ^exp(-£X'AX)dV..<fcJ,  (156) 

where  (Aij)  is  the  matrix  which  is  inverse  to  the  so-called 
correlation  matrix  (pij),i,j  =  1,2,  ...,/>.    Here  we  have  taken 
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the  #'s  to  have  unit  variances  and  zero  means.  It  is  readily 
verified  that  (155)  reduces  to  the  univariate  and  bivariate 
results  of  earlier  examples  when  p  =  1,2. 

For  a  sample  of  n  (denoting  for  convenience  now  summation 
over  the  sample  by  S)  we  have  for  the  frequency  function 

\A\in  (  n        v  ..  J 

WFV  exp  I "  **-i  i?-iAl' Xki  *ki\ 

=  p^  exP  { -  *ss^'  tew  -  *)  (*«  -  *i)  -  \  s^A<i  *  *i) ' 

(157) 
Exactly  as  before  we  may,  by  a  Helmert  transformation, 
remove  the  means  x  from  the  picture  and  concentrate  on  the 
frequency 

I   Jli(n-l)  I  n-1     p  \ 

n»W«-^{-i£1£lAiiu«U»)-  (158) 

(In  the  ordinary  way,  we  do  not  bother  about  preliminary 
constants  in  this  class  of  work,  being  content  to  evaluate  at 
the  final  stage  by  use  of  the  fact  that  the  total  frequency  must 
be  unity.  Here,  however,  a  final  integration  would  be 
formidable.    We  therefore  attach 

(In)** 

to  the  second  factor  in  (157)  to  make  the  distribution  of  means 
have  unit  frequency,  and  remove  it  from  the  constant  in  (157), 
giving  us  the  constant  in  (158).) 
Put 

71-1 

Sukiukj=  (q^naq.  (159) 

k=l 

The  quantities  a  are  then  our  sample  variances  and  covariances. 
We  have  for  the  frequency  (158) 

n»p(2l)*<»-»PCTK~2S^S-  (160) 
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As  in  the  bivariate  case,  our  main  problem  arises  in  determining 
the  differential  element  to  be  adjoined  to  this  quantity. 

Let  us  take  p  flat  spaces  of  n  —  1  dimensions,  one  for  each  u, 
and  let  the  sample  points  be  represented  by  Ply  ...,Pp.  We 
consider  the  variation  of  P±,  then  the  variation  of  P2  given  Plf 
then  that  of  P3  given  P1  and  P2,  and  so  on.  We  then  multiply 
these  together  to  get  the  variation  of  Plt  ...,Pp.  The  point  O 
is  the  origin. 

Consider,  in  fact,  Pm  given  Pv  P2y . . .  y  Pm_1.  For  fixed  OPm 
and  angles  PmOPvPmOP2,  ...yPmOPm_lf  Pm  lies  on  a  hyper- 
sphere  in  n  —  m  dimensions.  Regarding  the  spaces  as  super- 
posed, let  the  length  of  the  perpendicular  from  Pm  on  to  the 
(m—  l)-flat  determined  by  0,Plf  ...yPm_x  be  tm.  The  content  of 
variation  of  Pm  is  then,  from  (111), 


27ri(n-m)  fn—m-1 


(161) 


We  require  the  content  due  to  variation  perpendicular  to  this 
hypersphere.    Consider  the  transformation,  based  on  (159) 


The  Jacobian  is 


n-l 

Zi  ukmukii 
k=l 


i,  =  1,2,  ...,m. 


(162) 


ii 


«21 


u12 

U22 


llm 


1m 


2ulm     2u2m 


2u, 


=  2v. 


(163) 


where  vm  is  the  content  of  the  parallelotope  determined  by 
0,Ply...yPm.   Moreover 


and 


^m  ~  Vmlvm-V 
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(164) 
(165) 


Thus  the  differential  element  is 

|       m 
7y—   II  dtjmi 

and,  adjoining  this  to  (161),  we  have  for  the  total  element  of 
variation  in  Pm,  given  O,^,  ...,Pw_i, 


,_£(n-m)  7,w-m-2  m 

f5Ffe^  (166) 

We    now   multiply   expressions   like   this   for   m  =  1,2,  ...,/>, 
remembering  that  t;0  =  1.   We  find 

nr  {i  («-*)}         '-1*'1 

Finally,  since  from  (164)  v^  =  np\a\,  we  have  for  the  required 
distribution 


dF 


(!n\\p(n-\)  I  £  ll(n-l)  I  ^  U(n-p-2) 


^=1 


exp  (  -  5  2^«  %)  .njav        ( 1 67) 


Correlations  as  angles 

56  In  the  case  of  a  single  variable  we  represent  a  set  of  n 
values  xlf  x2, . . .,  xn  as  a  point  Px  in  w  dimensions,  or  equivalently 
as  a  vector  OP±  where  O  is  the  origin.  Similarly  we  can  represent 
a  set  of  n  values  of  a  p-variate,  specified  by 

(xij)J=  1,2,  ...,n,j  =  1,2,  ...,p, 

as p  points  Pl9  ...yPp  or  asp  vectors  OPx, ...,  Oi^.  In  effect  this 
is  what  we  did  in  regarding  spaces  as  ' 'superposed"  in  deriving 
the  Wishart  distribution.  Just  as  in  the  one-dimensional  case 
our  distribution  is  represented  by  a  density  of  points,  so  in  the 
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^ 


^-dimensional  case  we  may  picture  a  density  of  sets  of  p  points 
or  vectors. 

The  length  of  the  vector  OPj  is  given  by 

opj=  24. 

t=i 
If  we  transfer  O  to  the  mean  of  the  n  values,  that  is  to  say,  to  the 

n 

point  whose  7th  coordinate  is  2  xaln  =  xp  the  length  becomes 


i=l 

n 


OPj=  2  (%-*,) 


v\2 


1=1 


=  nvarxj,  (168) 


where  varx,-  is  the  variance  of  the  variate  x$  in  the  set  of 
n  values. 

Furthermore 


S  (xij  —  xj)  (xik  ~  xk) 

cosP,  OPk  =  p^_^_-_ ^ 

=  correlation  between  Xj  and  xki  say  rjk.        (169) 

Hence  the  variance  and  correlation  structure  of  the  p  variables 
(in  this  set  of  n)  is  represented  by  lengths  and  angles. 

57  If  r^  =  1  then  the  angle  between  the  vectors  is  zero  and 
one  variate  is  a  linear  function  of  the  other.  If  r{j  =  0  the 
variates  are  uncorrelated  (in  this  set  of  n).  It  does  not  follow 
that  the  random  variables  from  which  they  emanate  are  indepen- 
dent, for  two  reasons:  first,  two  variables  can  be  uncorrelated 
without  being  independent ;  second,  the  observed  set  of  n  need 
not,  and  in  general  will  not,  exactly  reproduce  the  parent  value 
of  the  correlation.  We  may  nevertheless  conceive  of  the 
(n—  l)-flat  orthogonal  to  OPj  as  containing  variation  which  is 
uncorrelated  with  Xj.  In  the  particular  case  when  the  variation 
is  multivariate-normal  zero  correlation  implies  independence. 
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Thus,  a  knowledge  of  xi  imposes  no  constraint  on  the  variation 
of  the  other  variables,  which  can  vary  in  the  (n—  l)-flat  just 
as  if  there  were  no  x{. 

58  Consider  three  variables  corresponding  to  OP1}  OP2,  OP3 
and  project  OP1  and  OP2  on  the  (n—  l)-flat  perpendicular  to 
OPs.  If  the  projected  points  are  Q±  and  Q2  the  correlation 
between  xt  and  x2  in  that  flat  is  measured  by  the  cosine  of  the 
angle  QiOQ2.  This  is  written  r123  and  is  interpreted  as  a 
measure  of  the  relationship  between  xx  and  x2  independently 
of  their  dependence  on  *1.  Its  magnitude,  from  section  39,  is 
given  by 


12       '13 '23 


(170) 


123    {(i-^Mi-is)}1' 

It  is  called  a  partial  correlation. 

Likewise,  if  we  project  on  to  a  flat  orthogonal  to  a  further 
vector,  say  x4,  we  derive  a  partial  correlation  of  second  order, 
r1234,  expressing  the  relationship  between  xx  and  x2  "when 
their  dependence  on  x3  and  x4  has  been  eliminated".  And  so  on. 

59  For  normal  variation  the  surfaces  of  constant  density  for 
xi  are  hyperspheres  centred  at  O.  Parental  correlations  between 
the  variables  are  represented  by  constraints  on  angles  between 
the  vectors.  The  whole  representation,  however,  is  invariant 
under  a  rotation  of  axes.  Let  us  make  such  a  rotation  such 
that  OP3  is  one  of  them.  If  we  project  the  whole  complex  on  to 
the  Sn_1  which  is  orthogonal,  we  see  that  hyperspheres  of 
constant  density  remain  so  and  total  correlations  are  replaced 
by  partial  correlations.  Thus  the  sampling  distribution  of 
r12  3  will  be  exactly  the  same  as  that  of  r12  except  that  we  are 
in  one  lower  dimension.  The  reader  may  care  to  try  to  prove 
this  proposition  without  geometrical  reasoning;  there  is  no 
more  convincing  demonstration  of  the  power  of  the  geometrical 
approach. 
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Regression  and  multiple  correlation 

60  Suppose  now  that  each  sample-member,  in  addition  to 
bearing  values  of  xlt  ...,xp1  also  bears  the  values  of  a  variate  y\ 
and  that  we  are  interested  in  the  dependence  of  y  on  the  x's. 

The  points  Px  to  Pp,  together  with  the  origin,  define  a 
containing  space  of  p  dimensions.  This  makes  a  certain  angle 
(unique)  with  the  vector  OY.  The  cosine  of  this  angle  is  called 
the  coefficient  of  multiple  correlation  of  y  on  xl9  x2,  ...,xp.  It  is 
usually  denoted  by  R. 

If  y  is  a  linear  function  of  the  x's,  say 

y  =  PiXi  +  -..+PPxp,  (171) 

then  OY  lies  entirely  in  the  space  of  x's  and  R  is  unity.  If  y 
does  not  depend  on  the  x's  OF  is  orthogonal  and  R  =  0.  R 
may  assume  any  value  in  the  range  0'<2?<1,  That  it  is  a 
correlation  may  be  seen  from  the  fact  that  it  is  the  cosine  of  an 
angle  between  OY  and  the  orthogonal  projection  of  07  on  the 
space  of  x's.  It  measures  the  closeness  with  which  the  linear 
representation  of  y  in  terms  of  the  x's  is  realized.  It  is  evident 
intuitively  (and  readily  checked)  that  R  is  the  cosine  of  the 
least  angle  of  the  family  between  OY  and  any  vector  OX  in 
the  x-space,  i.e.  R  is  a  maximum.  This  consideration  allows 
us  to  determine  the  j8's  of  equation  (171).  In  fact  we  have  to 
find  them  so  as  to  maximize  R  between  y  and  a  linear  function 

v 

2  Pj  Xj.    This  is  equivalent  to  minimizing  the  square  of  the 

3=1 

distance  from  Y  to  its  projection  on  the  #-space,  i.e.  to 
minimizing  the  length  of  the  vector  y  —  2/3;-  Xp  i.e.  to  minimizing 

S   (j-Sft*,)2  (172) 

sample  \  j  =  l  / 

which  leads  us  to  the  familiar  equations  of  least-squares 
regression  theory. 
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61  R  is  unaffected  by  the  length  of  the  vectors.  Let  us  take 
them  as  unity.  Consider  the  content  of  the  parallelotope 
determined  by  OYPly  ...,Pp.  Although  we  have  defined  these 
vectors  in  n  dimensions  they  effectively  define  a  space  of 
p  + 1  dimensions.  Take  a  coordinate  system  in  this  space  and 
let  the  coordinates  of  Pj  be  fy,  i=  1,  ...,/>  and  those  of  Y  be  r]v 


The  content  of  the  parallelotope 

£11 


C  = 


^11 

^3 


&8 


is  then 
spi 

bp2 

£p3 


(173) 


^lp  » lp         •  •  •         ^PP 

Multiplying  this  by  its  transpose  and  writing  ry  for  the 
correlation  between  ^  and  (j  and  ryi  for  that  between  Yi  and  xi9 
we  have 

1 


C2  = 


2/1 


J/2 


yp 


ip 


2p 


1 


(174) 


2/p       'pi 

The  r's  are  invariant  under  coordinate  transformation  and  can, 
therefore,  equally  well  be  calculated  from  the  original  x's. 

Now  if  6  is  the  angle  between  OY  and  the  space  of  x's,  the 
content  C  is  sin  9  times  the  content  of  the  parallelotope 
determined  by  0,Ply  ...,Pp,  which  is  the  minor  of  the  top 
left-hand  element  in  (174).   Thus  we  have 

l-#2  =  sin20  =  -^  (175) 

where  Wis  the  matrix  whose  determinant  is  given  in  (174)  and 
Wn  is  the  minor  of  the  element  in  the  top  left-hand  corner. 
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62  We  may  also  consider  the  content  as  obtained  in  a  different 
way:  by  constructing  the  plane  area  OYP±;  then  multiplying 
it  by  the  length  of  the  perpendicular  from  P2  on  to  that  space ; 
then  by  the  length  of  the  perpendicular  from  P3  on  to  the  space 
OYP1P2,  and  so  on.  These  perpendiculars  are  sines  of  angles 
each  of  which  is  a  partial  correlation.   Thus  we  find 

L—R     =  (1       r2/l)  (1  —  r2/2.l)(l  ~~  ry3.12)  •••  (*  ~  ryp.l2..Ap-l))i 

(176) 

a  decomposition  of  1  —  R2  which  can  be  carried  out  in  a  number 
of  different  ways  according  to  the  order  in  which  we  select  the 
^-vectors. 

63  Consider  now  the  distribution  of  R  for  samples  of  n  from  a 
multivariate  normal  distribution  in  which  the  variate  repre- 
sented by  OY  is  independent  of  the  other  variates.  This  means 
that  in  repeated  sampling  the  vector  OY  will  be  randomly 
orientated  with  respect  to  the  Sp  of  x's.  We  therefore  lose  no 
generality  by  supposing  the  x's  fixed,  namely  the  space  Sp 
fixed,  and  considering  the  distribution  of  the  angle  made  with 
it  by  a  randomly  orientated  vector  OY\  or  equivalently  of  the 
angle  determined  by  a  point  Y  moving  on  a  hypersphere  of 
unit  radius.  The  angle  6  is  the  one  between  OY  and,  say,  OZ 
in  the  Spt  this  being  the  minimum  possible  angle  for  varying 
positions  of  Z. 

If  OZ  and  6  are  fixed,  Y  varies  on  the  surface  of  a  hypersphere 
in  n  —p  —  1  dimensions  with  content  proportional  to  (sin  6)n~v~2. 
O  may  vary  independently  in  an  Sp  on  a  hypersphere  with 
surface  content  (cos  0)p_1.    For  6,  therefore,  we  have 

dF  oc  (sin  d)n~P-2  (cos  d)*'1  dd. 
Putting  R  =  cos  0  we  find 

dFoc  Rp-^l-R^-p-^dR 
or,  expressing  the  distribution  in  terms  of  R2  and  evaluating 
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the  constant, 

dF  =  -^ — -^ Tvs  (R2)^-2)  (l-RWn-p-3)  d(R*\ 

(177) 

64  If  the  parent  value  of  R  is  not  zero  a  more  complicated 
argument  is  necessary,  but  it  still  hinges  essentially  upon 
geometrical  considerations.  For  the  details  of  the  derivation 
see  Kendall  and  Stuart,  Vol.  1,  p.  339. 

Canonical  correlations 

65  In  generalization  of  the  regression  of  one  variable  y  on  a 
set  of  others  xlt . . . ,  xp,  we  may  consider  the  relationship  between 
a  set  of  q  variables  ylyy2,  •••>J;<Z  on  #i>*2>  •••»##•  We  shall  not 
develop  the  subject  here,  but  one  result  of  some  importance 
is  worth  mentioning  in  order  to  relate  it  to  the  discussion  of 
section  28  concerning  angles  between  flats. 

We  may,  in  fact,  regard  the  y's  as  corresponding  to  q  vectors 
in  an  Sq  and  the  x's  corresponding  to  p  vectors  in  an  Sp.  The 
correlation  relationships  between  the  two  spaces  depend  on 
angles  between  vectors  in  them.  We  may  state,  without  formal 
proof,  the  following  propositions,  which  bear  an  obvious 
analogy  to  the  results  of  section  28. 

It  is  possible  to  find  linear  transformations  of  x's  to  £'s  and 
y's  to  rfs  such  that  (1)  all  the  f's  are  independent  (i.e.  the  axes 
in  the  />-space  are  orthogonal) ;  (2)  all  the  77's  are  independent 
(all  the  axes  in  the  ^-space  are  orthogonal);  (3)  all  the  f's  are 
uncorrelated  with  all  the  -q's  except  for  p<q  correlations 
between  (f l9  7)^)  (f 2, 772), . . .,  (£pf  r]p) ;  and  these  are  stationary 
values  of  the  possible  correlations  between  a  vector  in  one 
space  and  a  vector  in  the  other.  These  correspond  to  the 
canonical  angles  between  the  spaces.  They  express,  in  the 
simplest  possible  form,  the  relationship  between  the  two  sets 
of  variables  and  are  known  as  "canonical  correlations". 
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Component  analysis 

66  Let  us  revert  to  the  case  of  a  set  of  p  vectors  in  Sn.  They 
determine,  as  we  have  remarked,  an  Sp  which  may  be  considered 
as  immersed  in  the  Sn.  In  this  Sp  we  can  find  linear  trans- 
formations to  new  variables  yly  ...,yp  which  are  orthogonal. 
In  fact,  we  can  do  so  in  more  ways  than  one.  Let  us  choose  as 
yx  the  axis  for  which  the  corresponding  variable  yLlikxk  has 
the  greatest  variance,  i.e.  such  that  the  sum  of  projections  of 
the  vectors  on  to  it  is  a  maximum.  Measuring,  as  usual,  from 
an  origin  at  the  means  and  taking  the  vector  to  have  unit  length, 
we  have  to  minimize 

subject  to  2/f  =  1.  Now  this  is  the  same  problem  that  we 
considered  in  section  34.  We  find  the  unconditioned  maximum 
of 

leading  to 

|r-A/|  =  0,  (178) 

where  r  is  the  correlation  matrix  r. 

Thus  the  process  of  finding  the  major  axes  of  the  quadratic 
form  HrijXiXj  is  the  same  as  finding  the  orthogonal  vectors, 
linear  in  the  x's,  with  stationary  variances.  These  vectors  are 
called  principal  components.  For  an  extended  account  of  them 
see  Kendall's  Multivariate  Analysis,  in  this  Series. 

67  The  above  examples  by  no  means  exhaust  the  applications 
of  n-  dimensional  geometry  in  statistics.  Equally  effective  use  of 
it  can  be  made  in  other  branches  of  multivariate  analysis.  In 
statistical  estimation,  however,  it  is  differential  geometry  which 
can  make  the  greatest  contribution.  Notwithstanding  some 
applications  of  geometry  in  the  large  to  problems  of  linear 
estimation  (cf.,  for  example,  Durbin  and  Kendall,  1951), 
estimation  in  general   has  not  yet  been  explored  from  the 
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geometrical  viewpoint.  The  reader  who  is  interested  may  refer 
to  papers  by  Huzurbazar  (1949)  and  Rao  (1960),  who  develop 
some  suggestive  results  concerning  the  curvature  of  the 
likelihood  surface  from  the  point  of  view  of  Riemannian 
geometry. 
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